{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 监督学习 - 双相障碍检测\n",
    "<br>\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. 实验介绍"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.1 实验背景\n",
    "\n",
    "双相障碍属于心境障碍的一种疾病，英文名称为 Bipolar Disorder（BD），别名为 Bipolar Affective Disorder，表示既有躁狂发作又有抑郁发作的一类疾病。\n",
    "\n",
    "目前病因未明，主要是生物、心理与社会环境诸多方面因素参与其发病过程。\n",
    "\n",
    "当前研究发现，在双相障碍发生过程中遗传因素、环境或应激因素之间的交互作用、以及交互作用的出现时间点等都产生重要的影响；临床表现按照发作特点可以分为抑郁发作、躁狂发作或混合发作。\n",
    "\n",
    "双相障碍检测，即通过医学检测数据预测病人是否双相障碍，或双相障碍治疗是否有效。\n",
    "\n",
    "医学数据包括医学影像数据与肠道数据。\n",
    "\n",
    "由于缺少医学样本且特征过多，因此选取合适的特征对双模态特征进行整合并训练合适的分类器进行模型预测具有较强的现实需求与医学意义。\n",
    "\n",
    "本实验需要大家完成少样本、多特征下的监督学习。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.2 实验要求\n",
    "\n",
    "a) 实现双模态特征选择与提取整合。\n",
    "\n",
    "b) 选择并训练机器学习模型进行准确分类。\n",
    "\n",
    "c) 分析不同超参数以及特征选择方法对模型的结果影响。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.3 实验环境\n",
    "\n",
    "可以使用 Numpy 库进行相关数值运算，使用 sklearn 库进行特征选择和训练机器学习模型等。\n",
    "\n",
    "## 1.4 注意事项\n",
    "+ Python 与 Python Package 的使用方式，可在右侧 `API文档` 中查阅。\n",
    "+ 当右上角的『Python 3』长时间指示为运行中的时候，造成代码无法执行时，可以重新启动 Kernel 解决（左上角『Kernel』-『Restart Kernel』）。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.5 参考资料\n",
    "\n",
    "Numpy：https://www.numpy.org/\n",
    "\n",
    "Scikit-learn： https://scikit-learn.org/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2.实验内容"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## 2.1 导入数据\n",
    "\n",
    "医疗数据集存放在左侧栏中的 `DataSet.xlsx` 中，共包括 39 个样本和 3 张表，表 `Feature1` 为医学影像特征，表 `Feature2` 为肠道特征，表 `label` 为样本类标。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/jovyan/.virtualenvs/basenv/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.\n",
      "  warnings.warn(msg, category=FutureWarning)\n"
     ]
    }
   ],
   "source": [
    "# 导入相关库\n",
    "import warnings\n",
    "import itertools\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "from time import time\n",
    "from minepy import MINE\n",
    "from sklearn import svm\n",
    "from sklearn import tree\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn import naive_bayes\n",
    "from scipy.stats import pearsonr\n",
    "from sklearn.manifold import TSNE\n",
    "from IPython.display import display\n",
    "from datetime import datetime as dt\n",
    "from sklearn.externals import joblib\n",
    "from sklearn.decomposition import PCA\n",
    "from sklearn.metrics import fbeta_score\n",
    "from sklearn.metrics import make_scorer\n",
    "from sklearn.metrics import recall_score\n",
    "from sklearn.model_selection import KFold\n",
    "from sklearn.feature_selection import chi2\n",
    "from sklearn.metrics import accuracy_score\n",
    "from sklearn.preprocessing import MinMaxScaler\n",
    "from sklearn.model_selection import ShuffleSplit\n",
    "from sklearn.model_selection import GridSearchCV\n",
    "from sklearn.feature_selection import SelectKBest\n",
    "from sklearn.model_selection import learning_curve\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "warnings.filterwarnings('ignore')\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Feature1 表的 shape: (39, 6670)\n",
      "Feature2 表的 shape: (39, 377)\n",
      "label    表的 shape: (39, 1)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>6660</th>\n",
       "      <th>6661</th>\n",
       "      <th>6662</th>\n",
       "      <th>6663</th>\n",
       "      <th>6664</th>\n",
       "      <th>6665</th>\n",
       "      <th>6666</th>\n",
       "      <th>6667</th>\n",
       "      <th>6668</th>\n",
       "      <th>6669</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.816394</td>\n",
       "      <td>0.313184</td>\n",
       "      <td>0.437542</td>\n",
       "      <td>0.421138</td>\n",
       "      <td>0.54941</td>\n",
       "      <td>0.740194</td>\n",
       "      <td>-0.097087</td>\n",
       "      <td>0.005081</td>\n",
       "      <td>0.009196</td>\n",
       "      <td>0.105606</td>\n",
       "      <td>...</td>\n",
       "      <td>0.548366</td>\n",
       "      <td>0.122165</td>\n",
       "      <td>0.289676</td>\n",
       "      <td>0.21173</td>\n",
       "      <td>0.364212</td>\n",
       "      <td>0.163357</td>\n",
       "      <td>0.282966</td>\n",
       "      <td>0.212861</td>\n",
       "      <td>0.417709</td>\n",
       "      <td>0.523209</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1 rows × 6670 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       0         1         2         3        4         5         6     \\\n",
       "0  0.816394  0.313184  0.437542  0.421138  0.54941  0.740194 -0.097087   \n",
       "\n",
       "       7         8         9     ...      6660      6661      6662     6663  \\\n",
       "0  0.005081  0.009196  0.105606  ...  0.548366  0.122165  0.289676  0.21173   \n",
       "\n",
       "       6664      6665      6666      6667      6668      6669  \n",
       "0  0.364212  0.163357  0.282966  0.212861  0.417709  0.523209  \n",
       "\n",
       "[1 rows x 6670 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>367</th>\n",
       "      <th>368</th>\n",
       "      <th>369</th>\n",
       "      <th>370</th>\n",
       "      <th>371</th>\n",
       "      <th>372</th>\n",
       "      <th>373</th>\n",
       "      <th>374</th>\n",
       "      <th>375</th>\n",
       "      <th>376</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>100.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.38706</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.07677</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>21.16578</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1 rows × 377 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     0    1    2        3    4    5    6    7    8    9    ...  367  368  369  \\\n",
       "0  100.0  0.0  0.0  0.38706  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   \n",
       "\n",
       "   370      371  372  373       374  375  376  \n",
       "0  0.0  0.07677  0.0  0.0  21.16578    0  0.0  \n",
       "\n",
       "[1 rows x 377 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   0\n",
       "0  1"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#导入医疗数据\n",
    "data_xls = pd.ExcelFile('DataSet.xlsx')\n",
    "data={}\n",
    "\n",
    "#查看数据名称与大小\n",
    "for name in data_xls.sheet_names:\n",
    "    df = data_xls.parse(sheet_name=name,header=None)\n",
    "    print(\"%-8s 表的 shape:\"%name,df.shape)\n",
    "    data[name] = df\n",
    "    \n",
    "#获取 特征1 特征2 类标    \n",
    "feature1_raw = data['Feature1']\n",
    "feature2_raw = data['Feature2']\n",
    "label = data['label']\n",
    "\n",
    "#显示第一条样本数据\n",
    "display(feature1_raw.head(n=1))\n",
    "display(feature2_raw.head(n=1))\n",
    "display(label.head(n=1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到，医疗数据中的样本和特征数量存在着极大的不平衡。\n",
    "\n",
    "其中医疗影像数据共 6670 维，肠道数据共 377 维，而样本仅有 39 个，其中正样本标签为 1 ，负样本标签为 -1 。\n",
    "\n",
    "因此，特征的筛选和组合以及机器学习模型的选择优化对提高模型的性能极其重要。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.2 准备数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**数据预处理** 是一种数据挖掘技术，它是指把原始数据转换成可以理解的格式。在这个过程中一般有数据清洗、数据变换、数据组织、数据降维和格式化等操作。\n",
    "\n",
    "对于本数据集，没有无效或丢失的条目；然而需要我们进行特征的筛选和整合。\n",
    "\n",
    "我们可以针对某一些特征存在的特性进行一定的调整。\n",
    "\n",
    "这些预处理可以极大地帮助我们提升机器学习算法模型的性能和预测能力。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**归一化数字特征**\n",
    "\n",
    "对数值特征施加一些形式的缩放，可以减少量纲对数据的影响。\n",
    "\n",
    "对数据分析发现，`Feature2` 中的特征值存在较大差异，比如第 0 维和第 374 维；大家可以试试观察其它列特征是否有这种现象？\n",
    "\n",
    "数据归一化的作用:                     \n",
    "１）把数据变成 (０,１) 或者（-1,1）之间的小数。主要是为了数据处理方便提出来的，把数据映射到 0～1 范围之内处理，更加便捷快速。                                         \n",
    "２）把有量纲表达式变成无量纲表达式，便于不同单位或量级的指标能够进行比较和加权。\n",
    "             \n",
    "注意：一旦使用了缩放，观察数据的原始形式不再具有它本来的意义了。\n",
    "\n",
    "我们将使用 [`sklearn.preprocessing.MinMaxScaler`](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html) 来完成这个任务。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import MinMaxScaler\n",
    "\n",
    "\n",
    "def processing_data(data_path):\n",
    "    \"\"\"\n",
    "    数据处理\n",
    "    :param data_path: 数据集路径\n",
    "    :return: feature1,feature2,label: 处理后的特征数据、标签数据\n",
    "    \"\"\"\n",
    "    \n",
    "    #导入医疗数据\n",
    "    data_xls = pd.ExcelFile(data_path)\n",
    "    data={}\n",
    "    \n",
    "    #查看数据名称与大小\n",
    "    for name in data_xls.sheet_names:\n",
    "            df = data_xls.parse(sheet_name=name,header=None)\n",
    "            data[name] = df\n",
    "    \n",
    "    #获取 特征1 特征2 类标    \n",
    "    feature1_raw = data['Feature1']\n",
    "    feature2_raw = data['Feature2']\n",
    "    label = data['label']\n",
    "\n",
    "\n",
    "    # 初始化一个 scaler，并将它施加到特征上\n",
    "    scaler = MinMaxScaler()\n",
    "    feature1 = pd.DataFrame(scaler.fit_transform(feature1_raw))\n",
    "    feature2 = pd.DataFrame(scaler.fit_transform(feature2_raw))\n",
    "\n",
    "    return feature1,feature2,label"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>6660</th>\n",
       "      <th>6661</th>\n",
       "      <th>6662</th>\n",
       "      <th>6663</th>\n",
       "      <th>6664</th>\n",
       "      <th>6665</th>\n",
       "      <th>6666</th>\n",
       "      <th>6667</th>\n",
       "      <th>6668</th>\n",
       "      <th>6669</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.811348</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.414645</td>\n",
       "      <td>0.201443</td>\n",
       "      <td>0.35027</td>\n",
       "      <td>0.662461</td>\n",
       "      <td>0.289633</td>\n",
       "      <td>0.372597</td>\n",
       "      <td>0.316459</td>\n",
       "      <td>0.49556</td>\n",
       "      <td>...</td>\n",
       "      <td>0.855183</td>\n",
       "      <td>0.670516</td>\n",
       "      <td>0.813051</td>\n",
       "      <td>0.594934</td>\n",
       "      <td>0.757542</td>\n",
       "      <td>0.567247</td>\n",
       "      <td>0.632496</td>\n",
       "      <td>0.578548</td>\n",
       "      <td>0.732079</td>\n",
       "      <td>0.857594</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1 rows × 6670 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       0     1         2         3        4         5         6         7     \\\n",
       "0  0.811348   0.0  0.414645  0.201443  0.35027  0.662461  0.289633  0.372597   \n",
       "\n",
       "       8        9     ...      6660      6661      6662      6663      6664  \\\n",
       "0  0.316459  0.49556  ...  0.855183  0.670516  0.813051  0.594934  0.757542   \n",
       "\n",
       "       6665      6666      6667      6668      6669  \n",
       "0  0.567247  0.632496  0.578548  0.732079  0.857594  \n",
       "\n",
       "[1 rows x 6670 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>367</th>\n",
       "      <th>368</th>\n",
       "      <th>369</th>\n",
       "      <th>370</th>\n",
       "      <th>371</th>\n",
       "      <th>372</th>\n",
       "      <th>373</th>\n",
       "      <th>374</th>\n",
       "      <th>375</th>\n",
       "      <th>376</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.998685</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.12642</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.019504</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.771442</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1 rows × 377 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        0    1    2        3    4    5    6    7    8    9    ...  367  368  \\\n",
       "0  0.998685  0.0  0.0  0.12642  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0   \n",
       "\n",
       "   369  370       371  372  373       374  375  376  \n",
       "0  0.0  0.0  0.019504  0.0  0.0  0.771442  0.0  0.0  \n",
       "\n",
       "[1 rows x 377 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#数据路径\n",
    "data_path = \"DataSet.xlsx\"\n",
    "\n",
    "#获取处理后的特征数据和类标数据\n",
    "feature1,feature2,label = processing_data(data_path)\n",
    "\n",
    "# 显示一个经过缩放的样例记录\n",
    "display(feature1.head(n = 1))\n",
    "display(feature2.head(n = 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.3 评价模型性能"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们的研究目的，是通过医学检测数据预测病人是否双相障碍，或双相障碍治疗是否有效。                           \n",
    "因此，对于准确预测病人是否双相障碍，或双相障碍治疗是否有效是问题的关键。                           \n",
    "这样看起来使用**准确率**作为评价模型的标准是合适的。                \n",
    "\n",
    "我们将算法预测结果分为四种情况：\n",
    "\n",
    "<center><img src=\"https://imgbed.momodel.cn/20200819172058.png\" width=600/><center>\n",
    "    \n",
    "<br>\n",
    "    \n",
    "**准确率（Accuracy）**是指分类正确的样本占总样本个数的比例\n",
    "$$accuracy = \\frac{预测正确的样本数}{总样本数} = \\frac{TP+TN}{TP+TN+FP+FN}$$\n",
    "    \n",
    "但是，把双相障碍的病人预测为正常人，或者把治疗无效预测为有效是存在极大的医学隐患的。        \n",
    "我们期望的模型具有能够 **查全** 所有双相障碍病人或者双相治疗有效法人病例与模型的准确预测**同样重要**。               \n",
    "因此，我们使用 **查全率（Recall）** 作为评价模型的另一标准。\n",
    "    \n",
    "**查准率（Precision）**在算法预测都为正类（Positive）样本中，实际是正类（Positive）样本的比例\n",
    "$$precision = \\frac{TP}{TP+FP}$$ \n",
    "    \n",
    "**查全率（Recall）** 在实际值是正类（Positive）的样本中，算法预测是正类样本的比例\n",
    "$$recall=\\frac{TP}{TP+FN}$$\n",
    "我们使用 **F-beta score** 作为评价指标，这样能够同时考虑查准率和查全率：\n",
    "\n",
    "$$ F_{\\beta} = (1 + \\beta^2) \\cdot \\frac{precision \\cdot recall}{\\left( \\beta^2 \\cdot precision \\right) + recall} $$\n",
    "\n",
    "当 $\\beta = 1$ 时，就是我们常听说的 **F1 值（F1 score）**                \n",
    "当 $\\beta = 0.5$ 的时候更多的强调查准率，这叫做 **F$_{0.5}$ score** （或者为了简单叫做 F-score）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "toc-hr-collapsed": false
   },
   "source": [
    "## 2.4 特征选择"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用监督学习算法的一个重要的任务是决定哪些数据特征能够提供最强的预测能力。                     \n",
    "专注于少量的有效特征和标签之间的关系，我们能够更加简单具体地理解标签与特征之间的关系，这在很多情况下都是十分有用的。\n",
    "\n",
    "可以看到：医疗数据中的样本和特征数量存在着极大的不平衡，其中医疗影像数据共 6670 维，肠道数据共 377 维，而样本仅有 39 个。\n",
    "\n",
    "因此，为了训练预测模型，特征的筛选和组合以及机器学习模型的选择优化极其重要。\n",
    "\n",
    "同时，在这个项目的情境下选择一小部分特征，也具有很大的医学意义。\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4.1 常见的特征选择方法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 对 feature1 和 feature2 进行整合\n",
    "features = pd.concat([feature1,feature2],axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "（1）**feature_importance 排序**\n",
    "\n",
    "选择一个有 `feature_importance_` 属性的机器学习分类器（例如决策树、AdaBoost、随机森林）或者 sklearn 中的统计函数对特征进行计算筛选。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'features' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m/tmp/ipykernel_63/2500884364.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      2\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0msklearn\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mtree\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0mclf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtree\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mDecisionTreeClassifier\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrandom_state\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m42\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mclf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfeatures\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mlabel\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      6\u001b[0m \u001b[0;31m# 提取特征重要性\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mNameError\u001b[0m: name 'features' is not defined"
     ]
    }
   ],
   "source": [
    "# 导入一个有 feature_importances_ 的监督学习模型\n",
    "from sklearn import tree\n",
    "clf = tree.DecisionTreeClassifier(random_state=42)\n",
    "clf.fit(features,label)\n",
    "\n",
    "# 提取特征重要性\n",
    "importances = clf.feature_importances_\n",
    "\n",
    "# 需要提取的特征\n",
    "# 定义特征数量并根据重要性排序 获得特征序号\n",
    "select_feature_number = 5\n",
    "select_features = (np.argsort(importances)[::-1])[:select_feature_number]\n",
    "\n",
    "# 查看提取的特征序号\n",
    "print(select_features)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "（2）**相关性系数选择**\n",
    "\n",
    "使用 `sklearn` 中的统计函数对特征进行计算筛选"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导入需要的库\n",
    "from sklearn.feature_selection import SelectKBest\n",
    "from sklearn.feature_selection import chi2\n",
    "from scipy.stats import pearsonr\n",
    "from minepy import MINE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "计算各个特征与标签的相关系数，常用的指标就是皮尔逊相关系数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 统计特征值和 label 的皮尔孙相关系数  进行排序筛选特征\n",
    "select_feature_number = 10\n",
    "select_features = SelectKBest(lambda X, Y: tuple(map(tuple,np.array(list(map(lambda x:pearsonr(x, Y), X.T))).T)), \n",
    "                              k=select_feature_number\n",
    "                             ).fit(features, np.array(label).flatten()).get_support(indices=True)\n",
    "\n",
    "# 查看提取的特征序号\n",
    "print(select_features)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "（3）**卡方检验**\n",
    "\n",
    "卡方检验就是统计样本的实际观测值与理论推断值之间的偏离程度。                       \n",
    "实际观测值与理论推断值之间的偏离程度就决定卡方值的大小；                \n",
    "如果卡方值越大，二者偏差程度越大；反之，二者偏差越小；若两个值完全相等时，卡方值就为0，表明理论值完全符合。                      \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 卡方检验 筛选特征\n",
    "select_feature_number = 10\n",
    "select_features = SelectKBest(chi2, \n",
    "                              k=select_feature_number\n",
    "                             ).fit(features, np.array(label).flatten()).get_support(indices=True)\n",
    "\n",
    "# 查看提取的特征序号\n",
    "print(select_features)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "（4）**互信息法**\n",
    "\n",
    "互信息法也是用来评定类别自变量对类别因变量的相关性的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 互信息法 筛选特征\n",
    "# 由于 MINE 的设计不是函数式的，定义 mic 方法将其为函数式的，返回一个二元组，二元组的第 2 项设置成固定的 P 值 0.5\n",
    "def mic(x, y):\n",
    "      m = MINE()\n",
    "      m.compute_score(x, y)\n",
    "      return (m.mic(), 0.5)\n",
    "\n",
    "select_feature_number = 5  \n",
    "select_features = SelectKBest(lambda X, Y: tuple(map(tuple,np.array(list(map(lambda x:mic(x, Y), X.T))).T)), \n",
    "                              k=select_feature_number\n",
    "                             ).fit(features, np.array(label).flatten()).get_support(indices=True)\n",
    "\n",
    "# 查看提取的特征序号\n",
    "print(select_features)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "（5）**特征降维之 t-SNE**\n",
    "\n",
    "所谓的降维就是指采用某种映射方法，将原高维空间中的数据点映射到低维度的空间中去。                  \n",
    "由于数据降维是函数映射，因此，不同于特征筛选，特征降维会改变的特征值，会丢失一定的特征信息。                   \n",
    "但这也有助于我们对特征进行低维观察和可视化，以选择进一步的筛选操作。                        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导入需要的库\n",
    "from sklearn.decomposition import PCA\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.manifold import TSNE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "TSNE 是由 T 和 SNE 组成，也就是 T 分布和随机近邻嵌入（Stochastic neighbour Embedding ），简单来说，TSNE 就是一种数据可视化的工具，能够将高维数据降到 2-3 维，然后将特征值绘制在平面图或者三维空间上，便于观察数据分布情况。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 选择降维维度\n",
    "tsne = TSNE(n_components=2)\n",
    "feature_tsne = tsne.fit_transform(features)\n",
    "\n",
    "# 可视化类标中不能出现负值\n",
    "tsne_label = np.array(label).flatten()\n",
    "\n",
    "# 可视化\n",
    "plt.scatter(feature_tsne[:, 0], feature_tsne[:, 1], c=tsne_label)\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "（6）**特征降维之主成分分析算法 PCA**\n",
    "\n",
    "Principal Component Analysis(PCA) 是最常用的线性降维方法，它的目标是通过某种线性投影，将高维的数据映射到低维的空间中表示，并期望在所投影的维度上数据的方差最大，以此使用较少的数据维度，同时保留住较多的原数据的特性。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 选择降维维度\n",
    "pca = PCA(n_components=2)\n",
    "feature_pca = pca.fit_transform(features)\n",
    "\n",
    "# 可视化标签中不能出现负值\n",
    "pca_label = np.array(label).flatten()\n",
    "\n",
    "# 可视化\n",
    "plt.scatter(feature_pca[:, 0], feature_pca[:, 1], c=pca_label)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "（7）**双模态特征选择和融合**\n",
    "\n",
    "以上特征选择都是在将医疗影像数据和肠道数据直接拼接后进行的。         \n",
    "但是事实上，双模态特征各自具有不同的分布和医学意义，因此，分别对各特征进行筛选，再按照相关算法进行特征的融合是比较合理的方法。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "select feature1 name: [1242 2060 2064 3290 3912]\n",
      "select feature2 name: [ 30  56  77 134 247]\n",
      "new_features shape: (39, 10)\n",
      "new feature:         1242      2060      2064      3290      3912      30        56    \\\n",
      "0   0.485120  0.791782  0.636598  0.736082  0.714859  0.000000  0.000000   \n",
      "1   0.538347  0.691736  0.936194  0.554668  0.904123  0.000000  0.000000   \n",
      "2   0.577353  0.836702  0.751379  0.749428  0.789394  0.000000  0.000000   \n",
      "3   0.886068  1.000000  0.879182  0.977530  0.906589  0.000000  0.000000   \n",
      "4   0.359287  0.548250  0.607674  0.728504  0.265145  0.000000  0.000000   \n",
      "5   0.254868  0.849786  0.693657  0.450962  0.694317  0.000000  0.000000   \n",
      "6   1.000000  0.957640  0.914794  0.925708  0.790024  0.000000  0.000000   \n",
      "7   0.563647  0.837080  0.853095  0.740983  0.862940  0.000000  0.000000   \n",
      "8   0.485647  0.562818  0.674462  0.743994  0.762817  0.000000  0.000000   \n",
      "9   0.994406  0.938872  1.000000  0.881409  0.948843  0.460018  0.000000   \n",
      "10  0.860102  0.908292  0.920159  1.000000  0.865796  0.000000  0.000000   \n",
      "11  0.793822  0.962327  0.932749  0.665052  0.860803  0.000000  0.000000   \n",
      "12  0.945940  0.896814  0.887113  0.783506  1.000000  1.000000  0.188081   \n",
      "13  0.524798  0.664367  0.737434  0.533417  0.873694  0.706379  1.000000   \n",
      "14  0.388228  0.630693  0.503046  0.705864  0.409934  0.256065  0.833242   \n",
      "15  0.832236  0.810882  0.695419  0.583099  0.772293  0.000000  0.995626   \n",
      "16  0.756643  0.994071  0.957064  0.838550  0.645884  0.000000  0.000000   \n",
      "17  0.038992  0.715588  0.650611  0.212007  0.223187  0.000000  0.000000   \n",
      "18  0.121516  0.321102  0.575681  0.291984  0.420792  0.000000  0.000000   \n",
      "19  0.418239  0.427844  0.502626  0.576584  0.561078  0.000000  0.000000   \n",
      "20  0.306087  0.392751  0.493941  0.458827  0.000000  0.000000  0.000000   \n",
      "21  0.499689  0.835374  0.735431  0.598486  0.064861  0.000000  0.000000   \n",
      "22  0.244373  0.738217  0.682258  0.636517  0.734340  0.000000  0.000000   \n",
      "23  0.493523  0.774362  0.629762  0.717919  0.388901  0.000000  0.000000   \n",
      "24  0.567852  0.375730  0.544477  0.675127  0.395419  0.000000  0.000000   \n",
      "25  0.911883  0.784717  0.623359  0.930026  0.741686  0.000000  0.000000   \n",
      "26  0.494330  0.753010  0.732799  0.633944  0.591089  0.000000  0.000000   \n",
      "27  0.500004  0.363097  0.414599  0.370212  0.697223  0.000000  0.000000   \n",
      "28  0.077868  0.000000  0.000000  0.447338  0.069420  0.000000  0.000000   \n",
      "29  0.417193  0.797927  0.666474  0.591786  0.603721  0.000000  0.000000   \n",
      "30  0.296600  0.319136  0.511441  0.771938  0.378031  0.000000  0.000000   \n",
      "31  0.000000  0.754988  0.727081  0.432213  0.282815  0.000000  0.000000   \n",
      "32  0.165449  0.096433  0.273646  0.000000  0.505885  0.000000  0.000000   \n",
      "33  0.491694  0.735635  0.576623  0.467502  0.765935  0.000000  0.000000   \n",
      "34  0.697200  0.675813  0.574329  0.615512  0.641911  0.000000  0.000000   \n",
      "35  0.333347  0.484421  0.590436  0.419312  0.124068  0.000000  0.000000   \n",
      "36  0.377530  0.536159  0.534318  0.272905  0.793306  0.000000  0.000000   \n",
      "37  0.674879  0.819923  0.768804  0.637472  0.671676  0.000000  0.000000   \n",
      "38  0.264341  0.645707  0.627354  0.555244  0.342143  0.000000  0.185894   \n",
      "\n",
      "        77        134       247   \n",
      "0   0.000000  0.010124  0.398846  \n",
      "1   0.000000  0.029143  0.000000  \n",
      "2   0.456466  0.000000  0.000000  \n",
      "3   0.000000  0.002503  0.514362  \n",
      "4   0.000000  0.084295  0.728069  \n",
      "5   0.000000  0.006915  0.000000  \n",
      "6   1.000000  0.008958  0.441346  \n",
      "7   0.000000  0.001405  0.411093  \n",
      "8   0.000000  0.000491  1.000000  \n",
      "9   0.510659  0.004994  0.000000  \n",
      "10  0.000000  0.038953  0.000000  \n",
      "11  0.754105  0.000000  0.316172  \n",
      "12  0.000000  0.303992  0.396677  \n",
      "13  0.000000  0.460967  0.398265  \n",
      "14  0.884841  0.480254  0.737637  \n",
      "15  0.000000  0.348158  0.148825  \n",
      "16  0.000000  1.000000  0.052300  \n",
      "17  0.000000  0.009539  0.000000  \n",
      "18  0.000000  0.007717  0.310273  \n",
      "19  0.000000  0.002193  0.036049  \n",
      "20  0.000000  0.056449  0.000000  \n",
      "21  0.000000  0.173612  0.000102  \n",
      "22  0.000000  0.000000  0.000000  \n",
      "23  0.000000  0.009914  0.039162  \n",
      "24  0.000000  0.000000  0.163337  \n",
      "25  0.000000  0.019923  0.223655  \n",
      "26  0.000000  0.012904  0.000000  \n",
      "27  0.010563  0.000158  0.135240  \n",
      "28  0.000000  0.005839  0.190085  \n",
      "29  0.000000  0.000572  0.093035  \n",
      "30  0.000000  0.010333  0.000000  \n",
      "31  0.000000  0.004095  0.026748  \n",
      "32  0.000000  0.023930  0.461462  \n",
      "33  0.000000  0.000000  0.519488  \n",
      "34  0.000000  0.004140  0.280961  \n",
      "35  0.000000  0.011823  0.081191  \n",
      "36  0.000000  0.010505  0.238323  \n",
      "37  0.036167  0.090063  0.150734  \n",
      "38  0.000000  0.000000  0.021735  \n"
     ]
    }
   ],
   "source": [
    "# 统计特征值和label的皮尔孙相关系数  对两类特征分别进行排序筛选特征\n",
    "select_feature_number = 5\n",
    "select_feature1 = SelectKBest(lambda X, Y: tuple(map(tuple,np.array(list(map(lambda x:pearsonr(x, Y), X.T))).T)), \n",
    "                              k=select_feature_number\n",
    "                             ).fit(feature1, np.array(label).flatten()).get_support(indices=True)\n",
    "\n",
    "select_feature2 = SelectKBest(lambda X, Y: tuple(map(tuple,np.array(list(map(lambda x:pearsonr(x, Y), X.T))).T)), \n",
    "                              k=select_feature_number\n",
    "                             ).fit(feature2, np.array(label).flatten()).get_support(indices=True)\n",
    "\n",
    "# 查看排序后特征\n",
    "print(\"select feature1 name:\", select_feature1)\n",
    "print(\"select feature2 name:\", select_feature2)\n",
    "\n",
    "# 双模态特征选择并融合\n",
    "new_features = pd.concat([feature1[feature1.columns.values[select_feature1]],\n",
    "                          feature2[feature2.columns.values[select_feature2]]],axis=1)\n",
    "print(\"new_features shape:\",new_features.shape)\n",
    "# print(\"new feature:\", new_features)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4.2 进行特征选择\n",
    "\n",
    "定义 `feature_select` 函数进行特征选择\n",
    "\n",
    "以皮尔逊相关系数为例，进行特征选择并得到新特征数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def feature_select(feature1, feature2, label):\n",
    "    \"\"\"\n",
    "    特征选择\n",
    "    :param  feature1,feature2,label: 数据处理后的输入特征数据、标签数据\n",
    "    :return: new_features,label:特征选择后的特征数据、标签数据\n",
    "    \"\"\"\n",
    "\n",
    "    # 整合特征\n",
    "    features = pd.concat([feature1, feature2], axis=1)\n",
    "\n",
    "    # 统计特征值和label的皮尔孙相关系数  进行排序筛选特征\n",
    "    select_feature_number = 12\n",
    "    select_features = SelectKBest(lambda X, Y: tuple(map(tuple, np.array(list(map(lambda x: pearsonr(x, Y), X.T))).T)),\n",
    "                                  k=select_feature_number).fit(features,np.array(label).flatten()).get_support(indices=True)\n",
    "\n",
    "    # 查看提取的特征序号\n",
    "    print(\"查看提取的特征序号:\", select_features)\n",
    "\n",
    "    # 特征选择\n",
    "    new_features = features[features.columns.values[select_features]]\n",
    "\n",
    "    # 返回筛选后的数据\n",
    "    return new_features, label"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#查看特征选择结果\n",
    "new_features,label=feature_select(feature1, feature2, label)\n",
    "print(\"特征 shape: \", new_features.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4.3 混洗和切分数据\n",
    "\n",
    "现在特征选择已经完成并得到了新的特征数据。                                         \n",
    "那么下面将数据（包括特征和它们的标签）整合并切分成训练集和测试集。                             \n",
    "其中 80% 的数据将用于训练和 20% 的数据用于测试。                      \n",
    "然后再进一步把训练数据分为训练集和验证集，用来选择和优化模型。                                     "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "\n",
    "def data_split(features, label):\n",
    "    \"\"\"\n",
    "    数据切分\n",
    "    :param features: 特征选择后的输入特征数据\n",
    "    :param label: 标签数据\n",
    "    :return: X_train:数据切分后的训练数据\n",
    "             X_val:数据切分后的验证数据\n",
    "             X_test:数据切分后的测试数据\n",
    "             y_train:数据切分后的训练数据标签\n",
    "             y_val:数据切分后的验证数据标签\n",
    "             y_test:数据切分后的测试数据标签\n",
    "    \"\"\"\n",
    "    # 将 features 和 label 数据切分成训练集和测试集\n",
    "    X_train, X_test, y_train, y_test = train_test_split(features, label, test_size=0.2, random_state=0, stratify=label)\n",
    "\n",
    "    # 将 X_train 和 y_train 进一步切分为训练集和验证集\n",
    "    X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=0, stratify=y_train)\n",
    "\n",
    "    return X_train, X_val, X_test, y_train, y_val, y_test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training set has 24 samples.\n",
      "Validation set has 7 samples.\n",
      "Testing set has 8 samples.\n"
     ]
    }
   ],
   "source": [
    "# 进行数据切分\n",
    "X_train, X_val, X_test, y_train, y_val, y_test = data_split(new_features, label)\n",
    "\n",
    "# 显示切分的结果\n",
    "print(\"Training set has {} samples.\".format(X_train.shape[0]))\n",
    "print(\"Validation set has {} samples.\".format(X_val.shape[0]))\n",
    "print(\"Testing set has {} samples.\".format(X_test.shape[0]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.5 监督学习模型"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " [`scikit-learn`](http://scikit-learn.org/stable/supervised_learning.html) 中的监督学习模型包括：\n",
    "- 高斯朴素贝叶斯 (GaussianNB)\n",
    "- 决策树 (DecisionTree)\n",
    "- 集成方法 (Bagging、 AdaBoost、 Random Forest、 Gradient Boosting)\n",
    "- K 近邻 (K Nearest Neighbors)\n",
    "- 随机梯度下降分类器 (SGDC)\n",
    "- 支持向量机 (SVM)\n",
    "- Logistic 回归（LogisticRegression）\n",
    "\n",
    "\n",
    "\n",
    "从监督学习模型中选择适合我们这个问题的模型。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了正确评估你选择的每一个模型的性能，创建一个能够帮助你快速有效地使用训练集并在验证集上做预测的训练和验证的流水线是十分重要的。\n",
    "\n",
    " - 从[`sklearn.metrics`](http://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics)中导入`accuracy_score`，`recall_score`和`fbeta_score`。\n",
    " - 用训练集拟合学习器，并记录训练时间。\n",
    " - 对训练集和验证集进行预测并记录预测时间。\n",
    " - 计算预测训练集的准确率，召回率和 F-score。\n",
    " - 计算预测验证集的准确率，召回率和 F-score。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 从sklearn中导入评价指标 - fbeta_score，accuracy_score，recall_score\n",
    "from sklearn.metrics import fbeta_score, accuracy_score, recall_score\n",
    "\n",
    "\n",
    "def train_predict(learner, X_train, y_train, X_val, y_val):\n",
    "    '''\n",
    "    模型训练验证\n",
    "    :param learner: 监督学习模型\n",
    "    :param X_train: 训练集 特征数据\n",
    "    :param y_train: 训练集 类标\n",
    "    :param X_val: 验证集 特征数据\n",
    "    :param y_val: 验证集 类标\n",
    "    :return: results: 训练与验证结果\n",
    "    '''\n",
    "\n",
    "    results = {}\n",
    "\n",
    "    # 使用训练集数据来拟合学习器\n",
    "    start = time()  # 获得程序开始时间\n",
    "    learner = learner.fit(X_train, y_train)\n",
    "    end = time()  # 获得程序结束时间\n",
    "\n",
    "    # 计算训练时间\n",
    "    # results['train_time'] = end - start\n",
    "\n",
    "    # 得到在验证集上的预测值\n",
    "    start = time()  # 获得程序开始时间\n",
    "    predictions_val = learner.predict(X_val)\n",
    "    predictions_train = learner.predict(X_train)\n",
    "    end = time()  # 获得程序结束时间\n",
    "\n",
    "    # 计算预测用时\n",
    "    # results['pred_time'] = end - start\n",
    "\n",
    "    # 计算在训练数据的准确率\n",
    "    results['acc_train'] = round(accuracy_score(y_train, predictions_train),4)\n",
    "\n",
    "    # 计算在验证上的准确率\n",
    "    results['acc_val'] = round(accuracy_score(y_val, predictions_val),4)\n",
    "\n",
    "    # 计算在训练数据上的召回率\n",
    "    results['recall_train'] = round(recall_score(y_train, predictions_train),4)\n",
    "\n",
    "    # 计算验证集上的召回率\n",
    "    results['recall_val'] = round(recall_score(y_val, predictions_val),4)\n",
    "\n",
    "    # 计算在训练数据上的F-score\n",
    "    results['f_train'] = round(fbeta_score(y_train, predictions_train, beta=1),4)\n",
    "\n",
    "    # 计算验证集上的F-score\n",
    "    results['f_val'] = round(fbeta_score(y_val, predictions_val, beta=1),4)\n",
    "\n",
    "    # 成功\n",
    "    print(\"{} trained on {} samples.\".format(learner.__class__.__name__, len(X_val)))\n",
    "\n",
    "    # 返回结果\n",
    "    return results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在下面的代码单元将实现以下功能：             \n",
    "- 导入三个监督学习模型。             \n",
    "- 初始化三个模型并存储在`'clf_A'`，`'clf_B'`和`'clf_C'`中。\n",
    "  - 使用模型的默认参数值，在接下来的部分中将需要对某一个模型的参数进行调整。             \n",
    "  - 设置`random_state`  (如果有这个参数)。       "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "DecisionTreeClassifier trained on 7 samples.\n",
      "GaussianNB trained on 7 samples.\n",
      "SVC trained on 7 samples.\n",
      "高斯朴素贝叶斯模型结果: {'acc_train': 0.7917, 'acc_val': 0.5714, 'recall_train': 0.5455, 'recall_val': 0.3333, 'f_train': 0.7059, 'f_val': 0.4}\n",
      "支持向量机模型结果: {'acc_train': 0.9583, 'acc_val': 0.7143, 'recall_train': 1.0, 'recall_val': 0.3333, 'f_train': 0.9565, 'f_val': 0.5}\n",
      "决策树模型结果: {'acc_train': 1.0, 'acc_val': 0.7143, 'recall_train': 1.0, 'recall_val': 0.6667, 'f_train': 1.0, 'f_val': 0.6667}\n"
     ]
    }
   ],
   "source": [
    "# 从sklearn中导入三个监督学习模型\n",
    "from sklearn import tree\n",
    "from sklearn import naive_bayes\n",
    "from sklearn import svm\n",
    "from time import time\n",
    "\n",
    "# 初始化三个模型\n",
    "clf_A = tree.DecisionTreeClassifier(random_state=42)\n",
    "clf_B = naive_bayes.GaussianNB()\n",
    "clf_C = svm.SVC()\n",
    "\n",
    "\n",
    "# 收集学习器的结果\n",
    "results = {}\n",
    "for clf in [clf_A, clf_B, clf_C]:\n",
    "    clf_name = clf.__class__.__name__\n",
    "    results[clf_name] = {}\n",
    "    results[clf_name] = train_predict(clf, X_train, y_train, X_val, y_val)\n",
    "\n",
    "    \n",
    "# 打印三个模型得到的训练验证结果\n",
    "print(\"高斯朴素贝叶斯模型结果:\", results['GaussianNB'])\n",
    "print(\"支持向量机模型结果:\", results['SVC'])\n",
    "print(\"决策树模型结果:\", results['DecisionTreeClassifier'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "toc-hr-collapsed": false
   },
   "source": [
    "## 2.6 提高效果"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们可以从三个有监督的学习模型中选择 **最好的** 模型。                             \n",
    "你将在整个训练集（`X_train`和`y_train`）上使用网格搜索优化至少调节一个超参数以获得一个比没有调节之前更好的目标结果。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "调节选择的模型的参数。\n",
    "\n",
    "使用网格搜索（`GridSearchCV`）来至少调整模型的重要参数（至少调整一个），这个参数至少需尝试 3 个不同的值。你要使用整个训练集来完成这个过程。\n",
    "\n",
    "- 导入 [`sklearn.model_selection.GridSearchCV`](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)  和  [`sklearn.metrics.make_scorer`](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html) \n",
    "- 初始化你选择的分类器，并将其存储在 `clf` 中。\n",
    " - 设置 `random_state`  (如果有这个参数)。\n",
    "- 创建一个对于这个模型你希望调整参数的字典。\n",
    " - 例如:  parameters = {'parameter' : [list of values]}。\n",
    " - **注意：** 如果你的学习器有 `max_features` 参数，请不要调节它！\n",
    "- 使用 `make_scorer` 来创建一个 `fbeta_score` 评分对象（设置 $\\beta = 1$）。\n",
    "- 在分类器 clf 上用 `scorer` 作为评价函数运行网格搜索，并将结果存储在 grid_obj 中。\n",
    "- 用训练集（X_train, y_train）训练 grid search object ,并将结果存储在 `grid_fit` 中。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "由于训练样本少，因此模型会存在较为严重的过拟合现象。\n",
    "\n",
    "定义函数 `plot_learning_curve` 绘制学习曲线以观察训练过程中的过拟合现象。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import learning_curve\n",
    "from sklearn.model_selection import ShuffleSplit\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "def plot_learning_curve(estimator, X, y, cv=None, n_jobs=1):\n",
    "    \"\"\"\n",
    "    绘制学习曲线\n",
    "    :param estimator: 训练好的模型\n",
    "    :param X:绘制图像的 X 轴数据\n",
    "    :param y:绘制图像的 y 轴数据\n",
    "    :param cv: 交叉验证\n",
    "    :param n_jobs:\n",
    "    :return:\n",
    "    \"\"\"\n",
    "    train_sizes, train_scores, test_scores = learning_curve(estimator, X, y, cv=cv, n_jobs=n_jobs)\n",
    "    train_scores_mean = np.mean(train_scores, axis=1)\n",
    "    test_scores_mean = np.mean(test_scores, axis=1)\n",
    "\n",
    "    plt.figure('Learning Curve', facecolor='lightgray')\n",
    "    plt.title('Learning Curve')\n",
    "    plt.xlabel('train size')\n",
    "    plt.ylabel('score')\n",
    "    plt.grid(linestyle=\":\")\n",
    "    plt.plot(train_sizes, train_scores_mean, label='traning score')\n",
    "    plt.plot(train_sizes, test_scores_mean, label='val score')\n",
    "    plt.legend()\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import GridSearchCV,KFold\n",
    "from sklearn.metrics import make_scorer\n",
    "\n",
    "def search_model(X_train, y_train,X_val,y_val, model_save_path):\n",
    "    \"\"\"\n",
    "    创建、训练、优化和保存深度学习模型\n",
    "    :param X_train, y_train: 训练集数据\n",
    "    :param X_val,y_val: 验证集数据\n",
    "    :param save_model_path: 保存模型的路径和名称\n",
    "    \"\"\"\n",
    "\n",
    "    #创建监督学习模型 以决策树为例\n",
    "    clf = tree.DecisionTreeClassifier(random_state=42)\n",
    "\n",
    "    # 创建调节的参数列表\n",
    "    parameters = {'max_depth': range(5,10),\n",
    "                  'min_samples_split': range(2,10)}\n",
    "\n",
    "    # 创建一个fbeta_score打分对象 以F-score为例\n",
    "    scorer = make_scorer(fbeta_score, beta=1)\n",
    "\n",
    "    # 在分类器上使用网格搜索，使用'scorer'作为评价函数\n",
    "    kfold = KFold(n_splits=10) #切割成十份\n",
    "\n",
    "    # 同时传入交叉验证函数\n",
    "    grid_obj = GridSearchCV(clf, parameters, scorer, cv=kfold)\n",
    "\n",
    "    #绘制学习曲线\n",
    "    plot_learning_curve(clf, X_train, y_train, cv=kfold, n_jobs=4)\n",
    "\n",
    "    # 用训练数据拟合网格搜索对象并找到最佳参数\n",
    "    grid_obj.fit(X_train, y_train)\n",
    "\n",
    "    # 得到estimator并保存\n",
    "    best_clf = grid_obj.best_estimator_\n",
    "    joblib.dump(best_clf, model_save_path)\n",
    "\n",
    "    # 使用没有调优的模型做预测\n",
    "    predictions = (clf.fit(X_train, y_train)).predict(X_val)\n",
    "    best_predictions = best_clf.predict(X_val)\n",
    "\n",
    "    # 调优后的模型\n",
    "    print (\"best_clf\\n------\")\n",
    "    print (best_clf)\n",
    "\n",
    "    # 汇报调参前和调参后的分数\n",
    "    print(\"\\nUnoptimized model\\n------\")\n",
    "    print(\"Accuracy score on validation data: {:.4f}\".format(accuracy_score(y_val, predictions)))\n",
    "    print(\"Recall score on validation data: {:.4f}\".format(recall_score(y_val, predictions)))\n",
    "    print(\"F-score on validation data: {:.4f}\".format(fbeta_score(y_val, predictions, beta = 1)))\n",
    "    print(\"\\nOptimized Model\\n------\")\n",
    "    print(\"Final accuracy score on the validation data: {:.4f}\".format(accuracy_score(y_val, best_predictions)))\n",
    "    print(\"Recall score on validation data: {:.4f}\".format(recall_score(y_val, best_predictions)))\n",
    "    print(\"Final F-score on the validation data: {:.4f}\".format(fbeta_score(y_val, best_predictions, beta = 1)))\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAERCAYAAACQIWsgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzsnXtAlFX+uB9UUBEDb4upeFnzntpFEDTvWiZZiRreL5l3c9VtM7d+X7ts1pa17ZbdL4qZJZK6WpaZqakECokYjQKJKKIjiowjw8Aw8/vjjdmQYRh4Z+Y94nn+UYYz5zznvC/zmfd9z/kcn9TUVBsSiUQikfyBOloLSCQSiUQ8ZHCQSCQSSQVkcJBIJBJJBWRwkEgkEkkFZHCQSCQSSQVkcJBIJBJJBWRwkNwwbN26lb/97W+atb9z505WrVrlkbptNhuvvPIK/fv355FHHuG3337zSDsSiavU01pAIrlRuP/++7n//vs9UvfWrVtJSUnhu+++Iz4+nv/7v//j008/9UhbEokryCsHiUQAtm/fzrRp0/D392fo0KGcOHGCoqIirbUkNzEyOEhqBVu2bGHUqFEMGzaMzZs3219/5513GDJkCMOGDWP79u3212fOnMmuXbtYvHgxs2bNAuDw4cPMnDmTV199lX79+jF9+vRyH9Bbt27l6aefLtduz549+frrrxk2bBgjR44kIyMDgAsXLjBx4kSGDBnCypUriYyM5OrVq5X6p6en06FDBwB8fHxYu3YtdevWZebMmRw+fBiAnJwc7rvvvnL/P3jwIA899JC9z7NmzSIlJQWAK1eucO+992Kz2SgsLGTFihUMHTqU6OhosrKyajTOkpsHGRwkNzwZGRnExMTw+eefExsbyzvvvENeXh65ubkcOXKEHTt2sGHDBl577bVy7/vPf/7DmDFjeOONN+yvHTt2jJYtW7J3716uXr3KgQMHqmx///79fPPNN0RERBAbGwvAhg0bCA0NZdu2bXz33XfExsbSuHHjSuu4evUq/v7+9p979OiBr6+v03YNBgPr16/n3//+N6NHjwZgxIgRdueDBw8yZMgQfHx8ePfdd2nZsiXff/89EyZMqDAWEsn1yGcOkhuehIQEcnJyePDBBwEwm81kZWXRp08fli9fTkxMDEeOHOHSpUvl3vfwww8zZMiQcq81adKEKVOm4OPjQ7du3TAajVW2P3/+fHx9fenVqxdHjhwBoH79+hQWFlJaWorFYqmyjnr16lFcXGz/+fXXX+exxx4rV8ZmK58GraioiJUrV3LrrbfaXxs+fDh/+ctfWLhwIQcOHGDs2LEA/PTTT+Tm5rJlyxZsNhtBQUFVOklubuSVg6RWMHr0aPbu3cvevXv59ttv6dmzJ8nJySxdupR27do5nGXUu3fvCq+1bt0aHx+farUdEhICUO597du358cff2T8+PEsW7as3FVBZXWcPXsWAIvFwvr166lXr/x3N71eX+7nP/3pT+UCA0Dz5s3x8/MjLy+PEydOcNddd9l/t2bNGvsYffTRR9Xqo+TmQwYHyQ1PaGgo+/fvR6/XYzAYGD9+PFlZWRw7doxu3boxcuRIvvvuO5fqqlOn+n8Sjt7z5Zdf8tJLL7F7924mTJhQZR0jR44kJiYGs9nM1q1b6datG/7+/gQEBHDu3DlsNhsbNmxwyWf48OG899573HXXXXa3vn37EhcXR2lpKd9//z0LFiyoXiclNx3ytpLkhmLXrl3s2bPH/vOSJUuYOnUq8+fPZ8qUKVgsFqZPn06XLl0ICAhg27ZtDB06lPvvvx9/f3+ysrJo3769xz0HDhzI7Nmz8fX1pXnz5syePZuRI0dWWn7WrFnk5+dz33330bJlS/uVzuTJk3n++efZsWMHgwcP5vjx41W2PXz4cO677z7ee+89+2vz5s3j+eefZ+jQobRo0YIXXnhBfScltRofuZ+DROJeLBYLEydOZP369dSvX5/9+/fz1ltv2R9WSyQ3AvLKQSJxM/Xq1aNv375ERkZisVgICgpi8eLFWmtJJNVCXjlIJBKJpALygbREIpFIKiCDg0QikUgqcMM+cxg8eLBXZp1IJBJJbSIzM5Mff/yxynI3bHBo3769fTWqu8nMzKRjx44eqdsdiO4H4jtKP3VIP3Vo6dejRw+XysnbSg5o2rSp1gpOEd0PxHeUfuqQfuoQ3Q9kcHBIYWGh1gpOEd0PxHeUfuqQfuoQ3Q9kcHBITVIoeBPR/UB8R+mnDumnDtH9QAYHh1SVKllrRPcD8R2lnzqknzpE9wMZHBziSppmLRHdD8R3lH7qkH7qEN0PZHBwSPPmzbVWcIrofiC+o/RTh/RTh+h+4KHgUFJSwqJFiyr9vdlsZuHChYwdO5YVK1Zgs9kcvqYVZXn1RUV0PxDfUfqpQ/qpQ3Q/8MA6h6KiIiZNmsTp06crLbNjxw6Cg4NZs2YNCxcuJD4+ntzc3Aqv9evXz916PLf9F9LOGZyWsdls+OyLd3vb7kJ0PxDfUfqpQ/qpQ61f91a3sHK0a+sVaorbrxwaNGjAl19+SXBwcKVlEhISiIiIACAsLIzExESHr11PbGws0dHRREdHk5uba98nOCcnh/z8fDIzMzGZTKSlpWG1WklOTgYgKSkJgOTkZPtm61arlaIiExaLheJiM8XFxZSUlFBUVITRaKSwsBCbzca1a8q9QaPxarl/r127hs1mxWQyUVpaitlspqSkhJKSEsxmM6WlpZhMJmw2K9euXaukDqPdp7S0lKKiIkpKSiguLqa42IzFYqGoyITVarVPfVPcrv2hLsWvqj6VlpZ6tU8GQ0G1+lS+Ls/36erVqx4/Tmr6dO3aNSHPvbI6lHNQzHOvsLAQo9Eo7LmnnH8GVcfp2rVr1frcs1qtpKWlYTKZcBVNVkgXFBQQEBAAQEBAAFlZWQ5fu57x48czfvx4AGbMmFHhvl2TJk0A6N69O4B9i8S7777b/rPy0u3u7pJEIpFogiufe3/8vato8kA6KCjoDxHbSJMmTRy+phVlEVdURPcD8R2lnzqknzpE9wONgkN4eDiHDh0ClFtMoaGhDl/TirKIKyqi+4H4jtJPHdJPHaL7gReCw9mzZ1m9enW51yIjI9Hr9URFRREYGEh4eLjD17RC9Kguuh+I7yj91CH91CG6H9zAO8HNmDHDY1lZJRKJpLbSo0cPvvjiiyrLyUVwDkhNTdVawSmi+4H4jtJPHdJPHaL7gQwODuncubPWCk4R3Q/Ed5R+6pB+6hDdD2RwcEh2drbWCk4R3Q/Ed5R+6pB+6hDdD2RwcIizBXwiILofiO8o/dQh/dQhuh/I4OCQK1euaK3gFNH9QHxH6acO6acO0f1ABgeHNGjQQGsFp4juB+I7Sj91SD91iO4HMjhIJBKJxAEyODigqKhIawWniO4H4jtKP3VIP3WI7gcyODgkKChIawWniO4H4jtKP3VIP3WI7gcyODjkwoULWis4RXQ/EN9R+qlD+qlDdD+QwcEhbdu21VrBKaL7gfiO0k8d0k8dovuBDA4OOXnypNYKThHdD8R3lH7qkH7qEN0PZOI9iUQiuamQifdUIHo6XdH9QHxH6acO6acO0f1AXjlIJBLJTYW8clCB6FFddD8Q31H6qUP6qUN0P3DzlYPZbGbZsmWcP3+ezp07s2rVKnx8fMqVKSgoYMmSJVgsFvr378+8efM4cOAAK1eupHXr1gA899xzdOjQwWlb8spBIpFIqo8mVw47duwgODiYuLg4DAYD8fHxFcp8/fXXdOzYkfXr13P06FHOnj0LQHR0NDExMcTExFQZGDxNSkqKpu1Xheh+IL6j9FOH9FOH6H7g5uCQkJBAREQEAGFhYSQmJjosV1hYiM1mw2azceLECQB2797NxIkTWbp0KTabto9BevTooWn7VSG6H4jvKP3UIf3UIbofuDk4FBQUEBAQAEBAQAAFBQUVykRGRnL16lWWLl2Kn58fRUVFhISEsGjRIjZu3MjFixcrvV0UGxtLdHQ00dHR5ObmkpeXR25uLjk5OeTn55OZmYnJZCItLQ2r1UpycjLwv/t7ycnJWK1W0tLSMJlMZGZmkp+fT05Ojr2+rKws0tLS0Ol0WCwWe4Qvq6Ps39TUVMxmM+np6RgMBrKzs9Hr9ej1erKzszEYDKSnp2M2m+1bAl5fR0pKChaLBZ1Oh9FoJCsry6U+ZWRkVLtPRqPRq3366aefqtWnmhwnNX1KSUnx+HFS06eTJ08Kee6V1ZGRkSHsuafT6UhLSxP23NPr9SQlJWl27rmKW585LF++nOHDhzNixAjWrVtHQUEBixcvLlfGYDBgsVho2rQpy5YtY/z48XTr1g1/f3/8/Px48sknGTp0KCNHjnTaliefORiNRnuQExHR/UB8R+mnDumnDi39NHnmEB4ezqFDhwDlFlNoaGiFMklJSbzwwgsUFxdz8uRJevfuTUxMDDt37sRqtZKRkUGnTp3cqVVt8vLyNG2/KkT3A/EdpZ86pJ86RPcDNweHyMhI9Ho9UVFRBAYGEhISwurVq8uVueeeezCbzUyfPp05c+bg7+/PxIkT2bp1K5MmTWLYsGF07NjRnVrVRuRvHCC+H4jvKP3UIf3UIbofQD13Vubn58eaNWvKvfbEE0+U+9nX15e333673GstWrTgk08+caeKKkpKSrRWcIrofiC+o/RTh/RTh+h+IBfBOcRqtWqt4BTR/UB8R+mnDumnDtH9QAYHh/j7+2ut4BTR/UB8R+mnDumnDtH9QAYHh1y+fFlrBaeI7gfiO0o/dUg/dYjuBzI4OKRVq1ZaKzhFdD8Q31H6qUP6qUN0P5DBwSGnTp3SWsEpovuB+I7STx3STx2i+4EMDg7p2rWr1gpOEd0PxHeUfuqQfuoQ3Q9kcHDI0aNHtVZwiuh+IL6j9FOH9FOH6H4gN/uRSCSSmwq52Y8KRN+IQ3Q/EN9R+qlD+qlDdD+QVw4SiURyUyGvHFRQlvJWVET3A/EdpZ86pJ86RPcDeeXgEKvVSp064sZN0f1AfEfppw7ppw4t/eSVgwp0Op3WCk4R3Q/Ed5R+6pB+6hDdD2RwcIjWe1hXheh+IL6j9FOH9FOH6H4gg4NDzp07p7WCU0T3A/EdpZ86pJ86RPcDGRwc0rRpU60VnCK6H4jvKP3UIf3UIbofyODgkMLCQq0VnCK6H4jvKP3UIf3UIbofuHknOLPZzLJlyzh//jydO3dm1apV+Pj4lCtTUFDAkiVLsFgs9O/fn3nz5rn0Pm8i8iwHEN8PxHeUfuqQfuoQ3Q/cfOWwY8cOgoODiYuLw2AwEB8fX6HM119/TceOHVm/fj1Hjx7l7NmzLr3Pm/j6+mraflWI7gfiO0o/dUg/dYjuB24ODgkJCURERAAQFhZGYmKiw3KFhYXYbDZsNhsnTpxw+X3ewmg0atp+VYjuB+I7Sj91SD91iO4Hbg4OBQUFBAQEABAQEEBBQUGFMpGRkVy9epWlS5fi5+dHUVGRS+8DiI2NJTo6mujoaHJzc8nLyyM3N5ecnBzy8/PJzMzEZDKRlpaG1Wq1r0Isy2OSnJyM1WolLS0Nk8lEZmYm+fn55OTk2OvLysrC398fnU6HxWIhJSWlXB1l/6ampmI2m0lPT8dgMJCdnY1er0ev15OdnY3BYCA9PR2z2UxqaqrDOlJSUrBYLOh0OoxGI1lZWS71qXnz5tXuk9Fo9GqfDAZDtfpUk+Okpk8+Pj4eP05q+hQUFCTkuVdWR/PmzYU993Q6Hf7+/sKee3q9npKSEs3OPVdx6wrp5cuXM3z4cEaMGMG6desoKChg8eLF5coYDAYsFgtNmzZl2bJljB8/nq1bt1b5vuvx5AppnU4ndL510f1AfEfppw7ppw4t/TRZIR0eHs6hQ4cA5RZTaGhohTJJSUm88MILFBcXc/LkSXr37u3S+7zJbbfdpmn7VSG6H4jvKP3UIf3UIbofuDk4REZGotfriYqKIjAwkJCQEFavXl2uzD333IPZbGb69OnMmTMHf3//Cu8LDw93p1a1+eWXXzRtvypE9wPxHaWfOqSfOkT3A5l4TyKRSG4qZOI9FYi+EYfofiC+o/RTh/RTh+h+IK8cJBKJ5KZCXjmoQPSoLrofiO8o/dQh/dQhuh/IKweJRCK5qZBXDiooW5AiKqL7gfiO0k8d0k8dovuBDA4O6dy5s9YKThHdD8R3lH7qkH7qEN0PZHBwSHZ2ttYKThHdD8R3lH7qkH7qEN0PZHBwSHBwsNYKThHdD8R3lH7qkH7qEN0PZHBwyJUrV7RWcIrofiC+o/RTh/RTh+h+IIODQxo0aKC1glNE9wPxHaWfOqSfOkT3AxkcJBKJROIAGRwcUFRUpLWCU0T3A/EdpZ86pJ86RPcDGRwcEhQUpLWCU0T3A/EdpZ86pJ86RPcDGRwccuHCBa0VnCK6H4jvKP3UIf3UIbofyODgkLZt22qt4BTR/UB8R+mnDumnDtH9QAYHh5w8eVJrBaeI7gfiO0o/dUg/dYjuBzLxnkQikdxUaJZ4z2w2s3DhQsaOHcuKFSuw2SrGnsLCQh5//HGmTp3K66+/DsCmTZsYNWoU06ZNY9q0aVy9etXdai4jejpd0f1AfEfppw7ppw7R/cADwWHHjh0EBwcTFxeHwWAgPj6+QpmvvvqK3r17s379ejIyMvjtt98AWLBgATExMcTExNC4cWN3q7nM3XffrVnbriC6H4jvKP3UIf3UIbofeCA4JCQkEBERAUBYWBiJiYkVyvj5+WEymbDZbBQXF+Pr6wvAxo0bGT9+PC+//LLDumNjY4mOjiY6Oprc3Fzy8vLIzc0lJyeH/Px8MjMzMZlMpKWlYbVaSU5OBv4XpZOTk7FaraSlpWEymcjMzCQ/P5+cnBx7fVlZWfz000/odDosFgspKSnl6ij7NzU1FbPZTHp6OgaDgezsbPR6PXq9nuzsbAwGA+np6ZjNZnt63uvrSElJwWKxoNPpMBqNZGVludSnpKSkavfJaDR6tU8//PBDtfpUk+Okpk8HDx70+HFS06fDhw8Lee6V1ZGUlCTsuafT6fjpp5+EPff0ej379+/X7NxzFbc/c5g7dy4zZswgIiKCuLg4jh8/zsqVK8uVKSkpYcqUKRQWFtK3b1+eeeYZjh8/TklJCb169WLkyJGsXbuW1q1bV9qOfOYgkUgk1UezZw5BQUEYjUYAjEYjTZo0qVDmww8/5JFHHmH79u0YDAaOHj1Ky5Yt6dWrF3Xr1iU4OJjLly+7W81lyr4JiIrofiC+o/RTh/RTh+h+4IHgEB4ezqFDhwDlFlNoaGiFMoWFhfj5+QHg6+tLYWEhr776KsnJyRQVFZGbm6vpPOAePXpo1rYriO4H4jtKP3VIP3WI7gceCA6RkZHo9XqioqIIDAwkJCSE1atXlyszYcIENm3axOTJkzGbzfTt25fZs2fzxhtvMG3aNObNm0dgYKC71VwmIyNDs7ZdQXQ/EN9R+qlD+qlDdD+Q6xwcYjQaCQgI8Ejd7kB0PxDfUfqpQ/qpQ0s/zZ451Aby8vK0VnCK6H4gvqP0U4f0U4fofiCDg0NE/sYB4vuB+I7STx3STx2i+4EMDg4pKSnRWsEpovuB+I7STx3STx2i+4EMDg6xWq1aKzhFdD8Q31H6qUP6qUN0P5DBwSH+/v5aKzhFdD8Q31H6qUP6qUN0P5DBwSFaLsBzBdH9QHxH6acO6acO0f1ABgeHtGrVSmsFp4juB+I7Sj91SD91iO4HMjg45NSpU1orOEV0PxDfUfqpQ/qpQ3Q/kMHBIV27dtVawSmi+4H4jtJPHdJPHaL7QTWCQ0FBARkZGej1+hviSbsajh49qrWCU0T3A/EdpZ86pJ86RPcDF9NnfPTRR+zZsweTycT06dNJSEhg1apV3vCrFJmyWyKRSKqPW9Nn7Nmzhw0bNhAUFMRDDz3E6dOnVQuKjOhb+InuB+I7Sj91SD91iO4HLgaHxo0b89///hez2czhw4c1zZjqDUTfwk90PxDfUfqpQ/qpQ3Q/cDE4vPjii/z666/ccsst/PDDDzz//POe9tKUsm32REV0PxDfUfqpQ/qpQ3Q/kCm7HWK1WqlTR9yJXKL7gfiO0k8d0k8dWvq59ZnD/PnzVQvdSOh0Oq0VnCK6H4jvKP3UIf3UIbofuBgcunbtyp49e1yq0Gw2s3DhQsaOHcuKFSuw2SpemBQWFvL4448zdepUXn/9dQDy8/OZPn06Y8aM4V//+lc1uuB+OnTooGn7VSG6H4jvKP3UIf3UIbofuBgcUlJSWL58ORMnTuTRRx9l1qxZlZbdsWMHwcHBxMXFYTAYiI+Pr1Dmq6++onfv3qxfv56MjAx+++031q9fz4ABA4iLi+PAgQNkZWXVuFNqOXfunGZtu4LofiC+o/RTh/RTh+h+APVcKfTxxx+7XGFCQgIjRowAICwsjMTERPr161eujJ+fHyaTCZvNRnFxMb6+viQmJrJixQrq1KlDnz59SExMpH379q73xI00bdpUk3ZdRXQ/EN9R+qlD+qlDdD9w8crBYrGwefNm/vnPfxIXF4fFYqm0bEFBgX2Xo4CAAAoKCiqUGTVqFAcOHODBBx+kffv2hISEUFBQQOPGjQFo1KgRBoOhwvtiY2OJjo4mOjqa3Nxc8vLyyM3NJScnh/z8fDIzMzGZTKSlpWG1Wu0zAsrmFCcnJ2O1WklLS8NkMpGZmUl+fj45OTn2+rKysrh8+TI6nQ6LxUJKSkq5Osr+TU1NxWw2k56ejsFgIDs7G71ej16vJzs7G4PBQHp6OmazmdTUVId1pKSkYLFY0Ol0GI1GsrKyXOpTYWFhtftkNBq92qeMjIxq9akmx0lNn3Jzcz1+nNT06erVq0Kee2V1FBYWCnvu6XQ6Ll++LOy5583j5KhPruLSbKWnnnqKtm3b0rt3b44dO8bp06d5+eWXHZZdvnw5w4cPZ8SIEaxbt46CggIWL15crsw777zDn/70J8aOHcuTTz7JpEmTeO2113jqqafo0aMHq1atonPnzowbN65SJ0/OVsrNzeXWW2/1SN3uQHQ/EN9R+qlD+qlDSz+3zlbKzc1lwYIF9O/fn/nz55Obm1tp2fDwcA4dOgQot5hCQ0MrlCksLMTPzw8AX19fCgsL6du3L/Hx8VitVo4cOUJYWJgrah7B19dXs7ZdQXQ/EN9R+qlD+qlDdD9wMTgEBwfz/vvvk5CQwPvvv09wcHClZSMjI9Hr9URFRREYGEhISAirV68uV2bChAls2rSJyZMnYzab6du3L5MnT+bHH39k7NixDBw4kLZt26rrmQqMRqNmbbuC6H4gvqP0U4f0U4fofuDibaWSkhI2b95MZmYmnTp1IioqSvPI58nbSkaj0f7cRERE9wPxHaWfOqSfOrT0c+ttJZvNRs+ePfn73/9uvx1Umzl79qzWCk4R3Q/Ed5R+6pB+6hDdD1wMDk888QTp6enUqVMHvV7P8uXLPe2lKbfddpvWCk4R3Q/Ed5R+6pB+6hDdD1wMDpcvX2bMmDEAzJ07l0uXLnlUSmt++eUXrRWcIrofiO8o/dRRYz+bDX7dDgU57hW6jlo7fl7EpUVwrVq14qOPPqJnz54cO3aMFi1aeNpLU3r37q21glNE9wPxHaWfOmrs9+NrsOcF8AuAIX+HsLlQ16WPoWpRa8fPi7h05fDss89SUlLCrl27aNiwIc8995ynvTRF9I04RPcD8R2lnzpq5Hd0oxIYeoyBdv3h27/D+4PhzGEx/LyI6H7g4mylv/71r0RFRXHgwAHy8/PJy8vjww8/9IZfpchtQiWSG4jMPbBhPLS/BybFQl1f0O2AncvBcA7ungHDV0LDJlqb1nrcOlvp4sWL9O/fn7Nnz/Lyyy9TWFioWlBkRI/qovuB+I7STx3V8ss9Bl9MgxZd4ZH1UM8PfHyg22hYmAARCyE5Bt4KhZQvlOcS3vTTANH9wMXgEBgYyOLFi+nUqRP79u2z50CqrYi+hZ/ofiC+o/RTh8t+V7KVK4YGgTA5FhrcUv739RvDfS/CnL0Q1A62zIF1oyEv3Tt+GiG6H7gYHF577TXmzZvH4sWLCQ4OrrDiubZRlgRLVET3A/EdpZ86XPIz5cOn48Bigimb4ZZWlZe9tRfM+g4e+BecPwbv9IM9L0KJ64niqu2nIaL7gdwm1CFms5n69et7pG53ILofiO8o/dRRpV9JEXwaBWcPw9QtyrMGVzHqYdf/g2OfQ5MOELkabhvuXj+N0dLPrc8cbjays7O1VnCK6H4gvqP0U4dTP6sVts6D0wdhzLvVCwwAAX+CqPdg2n+hTj34dCzEzgRD5Qk/q+UnAKL7gQwODnGWWFAERPcD8R2lnzqc+n33/+CXLXDvP+D2sTVv5M+DYP5BGPIM6L6CNWGQ8D5YS9X5CYDofiCDg0OuXLmitYJTRPcD8R2lnzoq9Yt/G+Lfgr7zIGKR+obq1YdBf4MF8dCmD+z8G3wwFHKSa+YnCKL7gQwODmnQoIHWCk4R3Q/Ed5R+6nDo98tWZWFbtwfhvlXKdFV30awjTPkSxn0CV3OVAPH136Co4k6TlfoJhOh+IIODRCJxB6cPwZdzIKQvRL0Pdeq6vw0fH7g9ChYdhrA5kPiBsjbieJxb1kZIyiODgwOKioq0VnCK6H4gvqP0U0c5v4snYONECGoLEzeCb0PPNt4gEEa9ArP3QONbYfOjysyoS5mO/QREdD+QwcEhQUFBWis4RXQ/EN9R+qnD7nf1vLKWoV59mBIH/k29J9H6LiVA3P+qkp/p7QjY9wpYzDfO+AmMW4OD2Wxm4cKFjB07lhUrVmBzcKl3+PBhpk2bxrRp0xg+fDjbtm3jwIEDDBs2zP76qVOn3KlVbS5cuKBp+1Uhuh+I7yj91HHhwgUwX4UN46DwEkzaBE3aeV+kTl3oO0e51dQ1En54Ed7phyFlh/ddqoHoxxfcHBx27NhBcHAwcXFxGAwG4uPjK5QJDQ0lJiaGmJgYOnfuTNeuXQGIjo62v96hQwd3alUbLfevdgXR/UB8R+mnjratb4VN0+BCGjwSA63u0Fbollth/CfK1Yu1lDbfz1eegRj12npVgujHF9wcHBISEoiIiAAgLCyMxMTESsuaTCZ1zniUAAAgAElEQVTOnDlDly5dANi9ezcTJ05k6dKlDq84vMnJkyc1bb8qRPcD8R2lnwpsNgq/eEzJtPrgf6BT9VYve5TbhsOCeC50nQHHv4S3+sDhj5SFeQIh9PH9HbcGh4KCAvum2QEBARQUOJ5mBhAfH0/fvn0BCAkJYdGiRWzcuJGLFy9WmhYjNjaW6OhooqOjyc3NJS8vj9zcXHJycsjPzyczMxOTyURaWhpWq5XkZGUudFkGxOTkZKxWK2lpaZhMJjIzM8nPzycnJ8deX1ZWFh06dECn02GxWEhJSSlXR9m/qampmM1m0tPTMRgMZGdno9fr0ev1ZGdnYzAYSE9Px2w22/OoXF9HSkoKFosFnU6H0WgkKyvLpT717Nmz2n0yGo1e7ZOvr2+1+lST46SmT8HBwR4/Tmr61K1bNyHPPYBznz1Ok9M7OddlBtbek4Q793SZp2n0wIvkPBhLcbNu8NUyit8dhOHkQSHOPb1eT2BgoGbnnqu4NbfS8uXLGT58OCNGjGDdunUUFBSwePFih2VXrlzJsGHDGDhwIFeuXMHf3x8/Pz+efPJJhg4dysiRI5225cncSklJSUJnTRTdD8R3lH41JGktbP8LF9uOosXMz9y7lsGN2MfPZoPUWGX9ReFlCJ8Pg59SssGK4KcBmuRWCg8P59ChQ4Byiyk0NNRhOZvNRmJiov3KISYmhp07d2K1WsnIyKBTp07u1Ko2Qv5R/gHR/UB8R+lXA05+CzuWQad7aTF9vbCBAf4wfj4+0OsR5YH13dMhfg2s6avsY63h7Wshj+91uDU4REZGotfriYqKIjAwkJCQEIfpvY8fP85tt91mz0o4ceJEtm7dyqRJkxg2bBgdO3Z0p1a1EX0jDtH9QHxH6VdNcpIgdga07AnjPiHpaIrWRk6pMH4NmyjpwGd9Bw2bwhdTYOMEyD8thp+AyJTdEonEOZd/gw9HgF8jeGy3kjX1RqbUAonvKftF2Kww6EklD1Q9P63NvIJM2a2CsgdMoiK6H4jvKP1c5FqessjNZlWmif4eGITxqwSnfnXrKVuTLkpUZlp9/xy8N0BJASKCnyDI4OCAHj16aK3gFNH9QHxH6ecCxYXKrRdDDkz8HJr/71mgEH5OcMkvsA1EfwoTv1D6+sn9sHUhXLskhp/GyODggIyMDK0VnCK6H4jvKP2qwFoKcY/B2SMw9kNo27fcrzX3q4Jq+XUZCQsT4J6lyu5zb90Nyes9ujZC9PEDGRwc0qZNG60VnCK6H4jvKP2cYLPBzifhxFdw/yvQbXSFIrVu/Pz8YfizMO8AtOgK/10Ea0cpK8A9gOjjBzI4OCQvL09rBaeI7gfiO0o/Jxx8Aw5/CP3/ouQtckCtHb8/dYMZX8NDa5Rss+8NgO9WQvE1Mfy8iAwODihb5S0qovuB+I7SrxKObYLdz8Lt42DYs5UWq9XjV6cO3DkFFh2B3hOUYLkmHE58I4afl5DBwQElJSVaKzhFdD8Q31H6OeC3vbB1AbQfAA+/rXxIVsJNMX6NmilXEDN3KtN4N0bD55Oh4KwYfh5GBgcHWAVL0nU9ovuB+I7S7zrOH4cvpiozkqI/VfZncMJNNX7t+sHc/coziYzv4a0wOPQmlNb8A1708QMZHBzi7++vtYJTRPcD8R2l3x8oOAsbxoNfAEyOhYZVb0Rz041fPT9lNtPCBOgwAHY9A+8PhjOVZ572qp8HkMHBAZcvX9ZawSmi+4H4jtLvd0xXlEVuxUaYslmZ++8CN+34NWmnrPmI3gCmfPhoBGz/i5LUTwQ/NyKDgwNatWqltYJTVPkV5Cgbs5uuuE/IAbV6DL2AV/wsZiXH0KUM5VZSsOsLs27q8fPxgW4PwMJEJe1G8np4KxRSPnc5mZ/o4wcyODhE621Kq6JGfkY97HwK/nMnfP0EvHsPnK64U5+7qJVj6EU87me1wtb5kPUjPPwO/HlQtd5+048fQP0AuO9FmLsPmnaALXNh3Wi4WPVGPqKPH8jg4JCyrUtFpVp+hZeVedr/7q0kG+s5HiZsVPbeXTsKfnhJSUSmpaMG3PR+u1fC8TjlIWuv8dV++00/fn+kZU94dBc88AacPwbv9IM9/4CSyjfWEX38QAYHhxw9elRrBae45FdUAD+sgjd6wcF/K5uvLzwMD6+BrqNg7o/QKxr2vawECTenLq4VY6ghHvVLeA8O/QdCH4P+S2pUxU09fo6oUwf6zIRFSXD7WNj/KrwdDum7xfCrATJld23DbISEd5WpdkVXoNuDMHgFBHd3XP5YLOxYqtxHfeBf0HOcd30l3uXX7cqU1S6jIHq9cgUpcT+n9isbI11Kh+4Pw8iX4ZZbtbYCZMpuVYi+EYdDvxITHHpLuX205wUI6Qtz9ikfAJUFBlBuKcz7EVp0gbhZyiIo81XPOArETemXnaAk02vTR0mmpyIw3JTjVx06DIT5B2HIM3DyG+WBdcJ7SkJDEfxcQF453OhYzJAcA/tXg/E8/HmwckKGON6itVJKS2DfK/DjamjSXvnwaC3+VoYSF8lLV6ZdNmyq7IbWqJnWRjcPl3+Dr56AzO/h1juUK/TWd2mmo8mVg9lsZuHChYwdO5YVK1ZgczCt6/Dhw0ybNo1p06YxfPhwtm3b5tL7vElycrKm7VdFcnKy8mGetA7evFuZfdT0zzDjK5i2rfqBAaCuLwx9GqbvUALOR/fCgTdqnLb4hhhDgXGr39UL8GkU1KmnbNjjhsBwU42fWpr+WRn38Wvh6nn4YCj6dTOU54IC49bgsGPHDoKDg4mLi8NgMBAfX3GqZGhoKDExMcTExNC5c2e6du3q0vu8yR133KFp+06xlnJHHZ1ymbp9sbIz15QvYebX0P4e9fW376+kLe4ySpnRsv4hMORWuxqhx5CbyM9shM8eUXZ0m/SFMuXSDdw04+cufHygxxhYdBjC5tAia5vyN5y62eW1Ed7GrcEhISGBiIgIAMLCwkhMrHxpuclk4syZM3Tp0sXl98XGxhIdHU10dDS5ubnk5eWRm5tLTk4O+fn5ZGZmYjKZSEtLw2q12r89lN3fS05Oxmq1kpaWhslkIjMzk/z8fHJycuz1ZWVlcezYMXQ6HRaLxb6dX1kdZf+mpqZiNptJT0/HYDCQnZ2NXq9Hr9eTnZ2NwWAgPT0ds9lMamqqwzpSUlKwWCzodDqMRiNZWVmV9+mX41hT4zC9fid1ts6nsLQOTPyc5LtexfrnIaT9+qvTPhmNRtf7lHMJw8i3uNT/WWxnDmN9O4KLB9ZVq08HDx6suk8qj1O1+nTdcUpKSvLMcXJTn9LS0tSfe4VGrn4cBedTSb/zGWh9t9v6pNPpvHKcavr3dOzYMTHPPUMR+j5PcDz835Q2+hPEzcK67iFOxO/02rnnKm595jB37lxmzJhBREQEcXFxHD9+nJUrVzosu2fPHg4dOsQzzzxTrfeV4clnDiaTiYYNG3qk7mpjs8GJnfDDi3DhODTvgrn/E9TvPc5p1ky3cfEkxD0K51OVqY/3/gN8qx4bocbQAbXez2aD/z4OP6+H0f+Gu2e4zQ1ugvHzMCaTiYb1/eDIx/D988qt3AF/hXuWVJn0UC2aPHMICgrCaDQCYDQaadKkSaVl9+3bx8CBA6v9Pm9w7tw5TdsHlD/ujN3wwVD4fCKUFELUB7AgnrO33O2dwADQojM89r2SJuDwh0qysQu/VPk2IcbQCbXeb98/lcAw8Em3Bwa4CcbPw5w7d06ZLRY2W7nV1O0B2LtKWUD3216t9QA3B4fw8HAOHToEKLeYQkMdPxi12WwkJibSt2/far3PWzRt2lTT9sk6oGx2/ulYuHYRHnxLWcDW6xGoU9f7fvXqK2kCpsQpK67fHwIJ7zu9V6r5GFZBrfZLXg97X4I7JsOQv7tP6g/U6vHzAuX8GreEcR8rzw6tpRDzEMTNVlLeaIhbg0NkZCR6vZ6oqCgCAwMJCQlh9erVFcodP36c2267jfr16zt8X3h4uDu1qk1hYaE2DZ9JhHUPwtpIuHwKRq2Gx5PgrqlQt572frcNh/mHlDw8O/8Gn0UrDzodoJmji9Rav/TdSpbQjkOV20k+Pu4V+51aO35ewqHfbcNgQTwMWg5pW+HNPnD4oxrPGFSLXOfggNzcXG691YurGc8dVZ4ppO8C/+YwYBn0ebTSe/te97sem01Z0PPd/4OGTWDMu8qH0R/Q3LEKaqXfuZ/hk0ho9mdl97L6jT0jRy0dPy9SpV9eOny1TFlp3bqPsjbi1l5uaVuukFaBr6+vdxq6kKakTH5/kHLVMOz/4C8pELHQ6UNfr/lVho8PhM+D2T8owWH9GGXzE0uxvYjmjlVQ6/zys2DDI+DfFCZv9mhggFo4fl6mSr/mnWDaf5XnjFdOK58R3/zdLdkLXEUGBweUPRz3GHkZsHmW8vApcy8MegqWHFNmK9SveuNxj/u5SsvblQDR51Ell9NHw5VvPAjkWAm1yq/wsrJhT2mx8lyocUvPif1OrRo/DXDJz8dHec646LAyqeCnt5UtStO2eWVthAwODmjevLlnKs7PUnIXrQmFE18r09aWHIMhK6BBoPZ+NcHPX7nkjd4AV7LhvYGQvJ7mzcROzyDUGDrAZb8SE2ycoIz9xM+VHFleoNaMn0ZUy69hE+VvbNZ34N8MNk2D7/7Pc3K/I4ODA86ePeveCgtyYPsSJdVF6mboO1+5fTT8WeU2gNZ+7qDbA8rD6tZ3w38XYd00XdlGUVCEHMM/4JKftVRJpHcmEaLeh3YRnhf7nVoxfhpSI7+QUJizF+57SVlt7WHkA2kHWCwW6tWrV3XBqrh6AQ68Dkc+AZsV7p6u3Dq6Rd0WgW7z8wTWUjj4b2w/vIhPQEsY+wG066e1VQWEHkNc8LPZYOdyZQOn+16CiAXek6MWjJ/GaOknH0ir4Jdfql7k5ZTCy8pl3797K/s193pEmZIa+ZrqwOAWP09Spy4MWEb6gLeUZH5rI5VNhzyw25wahB5DXPA79KYSGCIWeT0wQC0YP40R3Q/klYN7MV2B+DXw0ztQbFS25Bz8FDTrqLWZNpivwtdPQspn0CZMuYpo0l5rqxuf1M3K3hs9xsDYj723Wl5SK5BXDiqo9kYc5qvKtoD/7gX7X4HbhiqLWcZ+4JHAcCNsFJKUlKRMpxzzDoz9CC7q4N0BygebAIg+hpX6nfoRts6Hdv3h4Xc1Cww37PgJguh+IK8c1FFcCEc+ggP/gsJL0Pl+JV2Bmxar1Crys5SUAGcTofckGPWKx+fi1zoupMHHI5WpqrO+VWaxSCTVRF45qKDKqG4xK7mF/nOHsvirZS8lOd2kz70SGG6Ebx0VHJu0V1btDloOxz5XriJytOuH6GNYwc9wDjaMUxZHTtmseWC44cZPMET3A3nlUD1KS+DoBtj3KhjOQtt+MPQZZYMcieucPvR7YrHzyvj1+4u8b+6MogL4ZBTkn1Y2dZJXphIVyCsHFZRtJmLHWgpHN8JbfZSkZo1bwtQtv+++5v3AUMFPQJw6tusH8w9A10jY/ezvu815N8Wy6GNo97MUwxdTlWc20THCBIYbZvwERXQ/kMHBIZ07d1b+Y7XC8Th4Oxy2zlPukU/8Ah7brSSa81DGS5f9BKZKx4ZNYPw6JR352SNKKhHdV96RQ/wx7Ny5s7KWYdtCOLVPGafrkhtqyQ0xfgIjuh/I4OCQ7NOnlQ+q9wbA5kfBpw48EgNz9kOXkZoFBbtfdram7buCS44+Pko68rn7IagtfD4JdixTHvSL4Kch2dnZyg5hqZuUW293TNRaqRw3xPgJjOh+AOIuIdQCmw0yvufP3z8HF45B0z9D1Idwe5SyuEsQgoODtVaokmo5Nu+k5I3Z84KyuOv0QWX6a8vbxfDTgDbnv1VW1989EwY8obVOBUQfP+mnHnnlUMapH5VpghvGYruWBw+t+X33tfFCBQaAK1euaK1QJdV2rFdf2Z966hYlJ9MHQ+Gndz2WfVLoMdR9RYM9/w86j1Q2fNL4StURQo8f0s8dyOCQnQDrRsO6B5S86ZGvcXnyd3DnlHK7r4lEgwYNtFaokho7dhz6+25zg+Gb5fDZI2C86E41QOAxPHMYNs/C0uJ2ZetIeQ7WCOmnHrefeWazmWXLlnH+/Hk6d+7MqlWr8HHwzefjjz9m3759NGzYkDfffJMtW7awdu1aeyrbNWvW0LixBxdJnfsZ9rwIGd9BoxZK8rI+M5V55Hpt92696WnUHCZ9oeSl2vWM8rB6zDvKNqW1mUuZsDEaGrfkyqj3aeHXSGsjyU2M268cduzYQXBwMHFxcRgMBuLj4yuUOXPmDJmZmaxbt44BAwZw/vx5ABYsWEBMTAwxMTGeCwwXT8Lnk+H9wZBzREmb/ZcUJXnZ77uvFRUVeaZtNyG6H7jB0ccH+s6BOT8oOew/HQvfPq0sQBTBz90YL8KnUcr/p8RhqiN2YBBu/K5D+qnH7cEhISGBiAglr3xYWBiJiYkOyxgMBqZPn05SUhJt2rQBYOPGjYwfP56XX37ZYd2xsbFER0cTHR1Nbm4ueXl55ObmkpOTQ35+PpmZmZhMJtLS0rBarSQnJwP/W42YnJyMNf80pZk/UHLP3/jtwW3k95hBzsUr9vqysrLw8/NDp9NhsVhISUkpV0fZv6mpqZjNZtLT0zEYDGRnZ6PX69Hr9WRnZ2MwGEhPT8dsNtvnNF9fR0pKChaLBZ1Oh9FoJCsry6U+BQUFle+T1UpaWhomk4nMzEzy8/PJyckp1yej0ejVPuXn51erTxWOU1mfbvkzvw3/iKJeUyH+LUreHUx+eqLqPlmtVo8fp0r7dN1xOp+dSUlMFFbDeQrHrEV3sYSAgAAhz72yOoKCgoQ993Q6HX5+fm4/Tu7sk8lk0uzccxW3r5CeO3cuM2bMICIigri4OI4fP87KlSvLlfnggw84c+YMzz//PJMnT2bZsmXUr1+fkpISevXqxciRI1m7di2tW7eutB1HK6RLSko4e/Zs1VHZZlWmp1ZCSUmJ0HvQiu4HimPjxo1p06aN+1x1Xyvz/i1FMPJluGtajR/Wpqen06lTJ/d4qaHUokzhzfhO2U2v6yhAIL9KkH7q0NLP1RXSbn/mEBQUZN8f1Wg00qRJxRwwjRo1on379gC0adMGvV5PaGgoTZo0oW7dugQHB3P58mWnwcERZ8+epXHjxrRv397hcw5XsVqt1BE4nYPofgClpaXk5+dz9uxZOnTo4J5Ku46CVodgy1zYvhgyv4fR/65RnqG2bdu6x0kNNht8tQzSv4XI1+2BAQTxc4L0U4fofuCB20rh4eEcOnQIUG4fhYaGVijTvXt3jh8/DijPH9q0acOrr75KcnIyRUVF5Obm1mjwioqKaNasmarAUFaPyIjuB8rEhGbNmrnf9ZZbYepWGP6cslDxnXsg62C1qzl58qR7vWrCj6sheR3cswxCZ5X7lRB+TpB+6hDdDzwQHCIjI9Hr9URFRREYGEhISAirV68uV+aOO+6gSZMmTJgwgfbt29OzZ09mz57NG2+8wbRp05g3bx6BgYE1al9tYADw9/dXXYcnEd0PFEd3HAuH1KkD9yyBWbugnp8yDXnPP6q121zPnj094+YqRz9TnHtFw7CKm8Vr7lcF0k8dovtBLcvK+uuvv9KtWzfVdV+7do1GjcSdLSK6H/zP0V3HpFLMRmUv5aOfVmu3uaSkJO6++27PeTkj43tl/Ua7/jB5sxLgrkNTPxeQfurQ0k9mZVWBmg/eo0ePcvToUTfawOOPP17uZ9EDA3jRsX4APLym/G5zx2KrfJtmHxy5x2DTNGjRFaLXOwwMoKGfi0g/dYjuB7U4t9Jz238h7ZyhRu+1Wkup4yBlRvdWt7BydA+n7y0LDHfccUeN2nbEm2++We7nG+nKwWv0HAdtQuHL2fDlY5CxGyJXV7rbnCbf3K5kKxv2NAiCybHQoPJbp/Kbrzqkn3pqbXBQg6PA4ArLly9ny5YtAKxdu5a9e/cCMHjwYCIiIvj555/55ptvyM3NZdKkSZSUlDBo0CBefPFFZsyYQceOHfn666/x8fFhz5499iX2gwcPttf17LPPYrFY+OGHHzAajXz77bcEBgYyZswY8vLy6Nq1K926dePpp5+u4FdUVMT48eO5dOkSrVq14vPPP8disTBjxgxOnz5NixYt+OKLL6hTpw4zZswgOzubdu3asXbtWvz8/Cr048KFC0yfPp38/HwefvhhVqxYYW9Lk+DVpB3M+FrZz3v/K3AmQbmiaFPxj9Drf5iFl+HTcVBSBLO2wS2tnBYX/YND+qlDdD+oxcGhqm/4zigsLKzRQ99//vOf9vvrM2bMsL+ekJDA4sWLeemllwBlhtZzzz3HnXfeSb9+/XjxxRcBJRlXfHw88+fPJzk5mX79+jlsJy0tjQMHDvDKK6+wZ88eunXrRkhICNu3b+eee+7h008/dfi+X375BR8fHw4dOsQ333yD0WgkJiaG3r178/nnn/Phhx9y7NgxDh8+TPfu3dm4cSMrV67kk08+Ye7cuRX68dJLLzFhwgRmzJhBeHg4c+bMoVmzZqrGUDV168GQFUpupi9nw8f3Kvt6919SLoFiSkoKvXv39o5TSZGyKj//FEz5Ev5U9TMYr/rVAOmnDtH9QD5zcEjDhg3dWl+PHj2Iioqy/1y/fn1ee+015s6da18TAjBz5kxASedbXFxcaX0zZ87Ex8fHXq5169YkJyczaNAgFi9eXOn77rrrLnr27Mno0aPZuXMnjRo1QqfTERYWBsCjjz5Knz59SEtLs69yj4iIIC0tzWE/Tpw4wTvvvMPgwYMxGo2cO/e/3dzcPYbVpl0EzDsA3UYr+yLElN9trkePmn95qBZWq7IuI/sQPPwOdBjg0tu85ldDpJ86RPcDGRwcomZufsOGDbl27RoAtt/TTQcEBJQrs3r1ap588knef//9ctM9ry9XGX5+5R9ifvPNNzzzzDMcOnSIyZMnV/q+o0ePEh4ezvbt28nLy2P//v107dqVhIQEAF588UU++eQTevTowU8//QTATz/9ZD+Rr/fr0qULL7/8Mnv37uWJJ54ot+BRiLUYDYNg3CdK+vWcZCWB3687AMjIyPCOw65nIG0rjHhBeS7iIl7zqyHSTx2i+4EMDg65/sO3OowYMYK4uDgiIiI4cOCAwzKjR49m9uzZjBkzhkaNGpX7xu0K9eqVvxt45513smDBAgYNGsT48ePtCwyvp0OHDrz55puEhYVx7tw5+vTpw5w5czh69Cj33HMPP//8M1OmTOGxxx7jl19+oX///pw8ebLcLbI/8tRTT/Hqq68SHh7O7t27admypf13asbQrfj4KOnX5+6HoHbwxWTYvoQ2wc0833b8GvhpDfSdB/0er7r8HyjLNyYq0k8dovuBXOfgELPZTP369VXX4ymu9/vggw9Yt24d9evXx9/fn7/+9a8MHjxYO0H+5+jxdQ7VwVL8+25z/6E48M/4TYyBlh5ajPTLFoidCd0eUPbKruYkh6ysLHuKGRGRfurQ0k+z3Eq1AdHzFl3vN3v2bGbPnq2RjWOEHMN6fnDvC9BxKHXj5ii7zY14Xvlm787V3FkH4cs5ENIXoj6o0U6Crt5i1Arppw7R/UDeVnKIzUNbU7oL0f1AcMeOQ8gbtxU6DoNvnoIN492325xeB59PVG5hTdxo3yOkupSUlLjHx0NIP3WI7gcyOEhuUiz1g5QP71Gr4dR+5WF1xm51lRpylUVudevDlM3g37TGVVmtVnUuHkb6qUN0P5DBwSFC3hL5A6L7gfiO/v7+yq2ksNkwZ6+yNama3eaKDMoVSOFlmLzJpfxOVfoJjPRTh+h+IIODQywW17N7aoHofiC+4+XLl//3Q3B3mL0HQmdD/Fvw4TBlO1lXKS1R8iXp0+CRGGh1p3v9BET6qUN0P5DBwSHemoZZ0xlFwkwTdYLojq1aXZe+wrehkotp4udQkAPvD4KkdcqGPM6w2eC/j8NvPygbD3Ua7hk/wZB+6hDdD2rzbKWdT8H51Bq91WYtdTzDpGVPuN/x/tbexGw2a78CuQpEdzx16hTdu3ev+Isu98P8Q7B1nrLbXMZu5UO/sucHP7wIKRth8Aq4a6rn/QRB+qlDdD+QVw4Oqen98tdff92e1+jNN98kJiaGa9euERkZycCBA+3pMVzl4sWLDBkyhPDwcBYsWABAXl4e48aNo2/fvsycOROr1cqlS5eIjIwkIiKCJUuWAMo86smTJ/PYY4/x6KOPAsruU4MHD6ZPnz7ExMTUqI+uUpY0UFS6du1a+S9vuRWmbFGmuZ74Gt69B7IcLGg88omS5O/OqTBouff8BED6qUN0P/DAlYPZbGbZsmWcP3+ezp07s2rVKoc7gn388cfs27ePhg0b8uabb2I0GlmyZAkGg4GBAweydOlSdSIqvuEX1jDd9Lhx43j66aeZMmUK33//PWvXriUnJ4e5c+dy7733ct9993HhwgWCg4Ndqm///v3cfvvtvPnmm2zatAmr1cqqVasYN24cM2fO5KWXXuL06dOsWbOGCRMmMHXqVKZPn863335Lly5d2L59O7t27SI8PByAJ598kmeffZZ+/fpx5513MnXqVI/t1lZYWCh0WvGjR49y1113VV6gTh3o/xdoPwDiHoO1D8DAJ5QgUNcXTnyj7P982wh44F/uXSfhip/GSD91iO4HHrhy2LFjB8HBwcTFxWEwGIiPj69Q5syZM2RmZrJu3ToGDBjA+fPnWb9+PQMGDCAuLo4DBw6QlZXlbjWXqemHWtu2bbl8+TLXrl2jbt26BAUF0aBBA9avX8/UqVO5cuUKJpPJ5fruv/9+AB544AF+/fVX6tSpg06nY+DAgYDyYd+uXbtKE+Xde++99sAAypXDypUruffeeyktLeXKlSs16qcriBwYANf/MFvfpaTeuHOycpXwyf3K6ufNM5XbjOPXKsFCK/6MdP0AAArfSURBVD+NkH7qEN0PPBAcEhIS7B9UYWFhJCYmOixjMBiYPn06SUlJtGnThsTERCIiIqhTpw59+vRx+D5vUZY4ryYMGjSIV199lcjISEBJbfHwww/z2WefVfsD8+DBg0ycOJEdO3awa9cuMjMz6dq1K/v37wdgzpw57Nmzx+VEeZ07d7bvMzFv3jyPPjRWM4beICkpyfXC9QOU5H3jPlFmMcXOUKa+TopVfqe1nwZIP3WI7gceCA4FBQX2D6WAgAAKCgoqlMnPz6dJkyasW7eOCxcukJycTEFBAY0bK7t2NWrUCIOh4i5usbGxREdHEx0dTW5uLnl5eeTm5pKTk0N+fj4lJSVYrVZMJhM2m83+AfXHf202GyaTCavVSlFRERaLheLiYoqLiykpKcFsNtOgQQN7HYWFhRXqAOW2SVkdpaWlmM1mSkpKeOihh3j99dcZPXo0RUVFDBs2jH/84x8MGzYMm83GuXPn7HVYrVa7zx/rKPPp0KEDTzzxBKGhoTRr1oy2bduyePFiNm3aZA+k4eHhPPXUU2zYsIGIiAgCAgIYOnQoxcXFWK1We59KS0t59tlnmTVrFnfddRdnzpyxj2tVffpjHUVFRVitVqfjYrPZqFOnDqWlpVgslgrHKTMzE5PJRFpaGlarleTkZOB/fzDJyclYrVbS0tIwmUxkZmaSn59PTk6O/bhnZWVhNBrR6XRYLBZSUlLK1VH2b2pqKmazmfT0dAwGA9nZ2ej1ekJCQsjOzsZgMJCeno7ZbCY1NdVhHSkpKVgsFnT1unNt2i4M3SZy+YGPyTVaPdan3r17V7tPer2++n3S6TAajWRlZVXrON19991eOU417VOXLl2q3SdvnXt6vZ4WLVp45Tg56pOruD3x3vLlyxk+fDgjRoxg3bp1FBQUVNhj4LPPPqOoqIhHH32U5cuXM3jwYD777DOeeuopevTowapVq+jcuTPjxlWe4tiTifdE34ZTdD/4n6NQiff+QHJystCX9tJPHdKvclxNvOf2K4fw8HAOHToEKLePQkNDK5Tp3r27Pa30mTNnaNOmDX379iU+Ph6r1cqRI0fsG9BogeirF0X3A/Ed3bnHtyeQfuqQfupxe3CIjIxEr9cTFRVFYGAgISEhrF69ulyZO+64gyZNmjBhwgTat29Pz549mTx5Mj/++CNjx45l4MCBtG3btkbtuyPhmxAb1ThBdD9QHEVOvqfT6bRWcIr0U4f0U0+t2s/h1KlTNG7cmGbNmqmaomm1WoXODSS6H0BpaSn5+flcvXqVDh06aK1TAZPJJPQiPemnDulXOTflfg5t2rTh7NmzXLyoLv1ySUkJvr7un57oLkT3AyW3UkBAgLA7Xp07d46OHTtqrVEp0k8d0k89tSo4+Pr6uuVbatlsKlER3Q/Ed2zatObptL2B9FOH9FOP2PcmNKJsmqaoiO4H4jtKP3VIP3WI7gcyODhE9Pv5ovuB+I7STx3STx2i+4EMDg4R/X6+6H4gvqP0U4f0U4fofnADP3PIzMy0p4lwN6LfLxfdD8R3lH7qkH7q0NLv3LlzLpW7YaeyepLo6GiXpnppheh+IL6j9FOH9FOH6H4gbytJJBKJxAEyOEgkEomkAnUXLFjwrNYSIuKp5xnuQnQ/EN9R+qlD+qlDdD/5zEEikUgkFZC3lSQSiURSARkcJBKJRFKBmzY42Gw2nn76aSZPnszjjz+OxWKpUObAgQMMGzaMadOmMW3aNE6dOuU1P1faNpvNLFy4kLFjx7JixQqvp8g+fPiw3W/48OFs27atQhmtxrCkpIRFixYBro+TN8fzj36unIvg3bH8o5+r7Wo1fq6ch9XphxquP5aFhYXCnXuuctMGh59//hmLxcKGDRswGo32DYquJzo6mpiYGGJiYryeerqqtnfs2EFwcDBxcXEYDAbi4+O96hcaGmr369y5M127dnVYzttjWFRURHR0tH08XB0nb43n9X6unovgnbG83s/VdrUaP1fPQ1f7oYbrj+WWLVuEOveqw00bHJo1a8aUKVMA50vZd+/ezcSJE1m6dKnXo3lVbSckJBAREQFAWFgYiYmJXvUrw2QycebMGbp06eLw994ewwYNGvDll18SHBwMuD5O3hrP6/1cPRfBO2N5vZ+r7Wo1fmVUdR6C58fv+mP59ttvC3XuVYebNji0a9eOnj178v3331NSUkL//v0rlAkJCWHRokVs3LiRixcvVthcyJO40nZBQQEBAQEABAQEUFBQ4DW/PxIfH0/fvn0d/k7LMSzD1XHSajxdORdBu7F0tV2tz0dn5yF4Z/yuP5bdu3cX+txzxk0bHAB++OEHPv30U9566y3q1q1b4feBgYGEh4cD0KpVKy5duuQ1N1faDgoKwmg0AmA0GjXL1bJv3z4GDhzo8HdajmEZro6TluNZ1bkI2o2lq+1qfT46Ow/Be+P3x2PZtGlT4c+9yrhpg0NeXh5r165lzZo1NGrUyGGZmJgYdu7cidVqJSMjg06dOnnNz5W2w8PD7fenExISCA0N9ZpfGTabjcTExEq/sWk5hmW4Ok5ajacr5yJoN5autqvl+VjVeQjeGb/rj6Xo554zbtrgsG3bNi5evMi8efOYNm0aX375JatXry5XZuLEiWzdupVJkyYxbNgwr27rd33b9evXr+AXGRmJXq8nKiqq3Lcib3L8+HFuu+026tevz9mzZ4UawzIcjZMjV63G8/pzccuWLUKNpaN2RRo/KH8eApqN3/XHsqSkROhzzxlyhbREIpFIKnDTXjn8/3buHqSRKIzC8JtFxFhYKKIWQcHAIAxokUJBsLESC6NNwB+IpFJQkIBFLCzUMnYxKKIoKYQUNmIjsUgjDP6ARcbORghJUDCFiCJbyIbdnSiLuuxCztOF+b47d4rJSe5cRkRE3qZwEBERB4WDiIg4KBxERMRB4SAVybZtbNv+UG+hUGBjY+NL57OysvKl44l8lnYrSUXa398HYGho6B/PROT/pHCQihONRkmlUgA0NjaytbUFQDAYpLOzE9u2icfj5PN55ufneX5+xufzMTMzA8DNzQ2xWIzl5WUAIpEIHo+HdDqNy+Vic3OztN/+Z7e3t4TDYR4fH+no6GBhYaF0LBgMluZxcnJCLBYDXvfvHx4eUldXRyQSIZ/PYxjGL70if4OWlaTizM3NEQqFCIVCpS9kgMvLS0zTJB6PA5DNZpmammJtbY3j4+N3xywWiyQSCQzDIJPJlK05PT3F6/WSSCTw+Xy8vLyUrevu7mZnZwe/308gEKCpqYlkMonX62V3d5dCocDV1dUHr17kz1T96wmI/C/a29vp7+8vfa6urmZ9fR23283Dw8O7vT+WpxoaGnh6eipb09vbi2VZTE9PY5om3769/dssk8lwcHBQCqrr62suLi6wLItisUgul3v37aMin6VwkIpUU1PD3d0d8PpeHpfLRW1t7S8129vbTE5OYhgGw8PD7473e2855+fnDAwM0NXVxfj4OIODg3g8Hkfd/f09S0tLrK6uUlX1eou2tbVhmiZ+v59UKkVzc/OfXqrIh2hZSSpST08PR0dHjI6OcnZ2Vramr6+PxcVFZmdncbvd5HK5T52ztbWVaDRKIBCgvr6elpaWsnV7e3tks1nC4TATExNYlsXIyAjpdJqxsTGSyeSbvSJfRQ+kRUTEQf8cRETEQeEgIiIOCgcREXFQOIiIiIPCQUREHBQOIiLi8B2wVPzUKN1+JAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "best_clf\n",
      "------\n",
      "DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n",
      "                       max_depth=5, max_features=None, max_leaf_nodes=None,\n",
      "                       min_impurity_decrease=0.0, min_impurity_split=None,\n",
      "                       min_samples_leaf=1, min_samples_split=2,\n",
      "                       min_weight_fraction_leaf=0.0, presort='deprecated',\n",
      "                       random_state=42, splitter='best')\n",
      "\n",
      "Unoptimized model\n",
      "------\n",
      "Accuracy score on validation data: 0.7143\n",
      "Recall score on validation data: 0.6667\n",
      "F-score on validation data: 0.6667\n",
      "\n",
      "Optimized Model\n",
      "------\n",
      "Final accuracy score on the validation data: 0.7143\n",
      "Recall score on validation data: 0.6667\n",
      "Final F-score on the validation data: 0.6667\n"
     ]
    }
   ],
   "source": [
    "#训练，优化模型并保存\n",
    "search_model(X_train, y_train,X_val,y_val, model_save_path='./results/my_model.m')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看出优化后的模型，比没有优化的模型好。但是也应该关注模型过拟合的现象问题。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对测试数据进行预测，观察模型的性能"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def load_and_model_prediction(X_test,y_test,model_path):\n",
    "    \"\"\"\n",
    "    加载模型和评估模型\n",
    "    :param X_test,y_test: 测试集数据\n",
    "    :param save_model_path: 加载模型的路径和名称,请填写你认为最好的模型\n",
    "    :return:\n",
    "    \"\"\"\n",
    "    #加载模型\n",
    "    my_model=joblib.load(model_path)\n",
    "\n",
    "    #对测试数据进行预测\n",
    "    copy_test = [value for value in X_test]\n",
    "    copy_predicts = my_model.predict(X_test)\n",
    "\n",
    "    print (\"Accuracy on test data: {:.4f}\".format(accuracy_score(y_test, copy_predicts)))\n",
    "    print (\"Recall on test data: {:.4f}\".format(recall_score(y_test, copy_predicts)))\n",
    "    print (\"F-score on test data: {:.4f}\".format(fbeta_score(y_test, copy_predicts, beta = 1)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy on test data: 0.8750\n",
      "Recall on test data: 0.6667\n",
      "F-score on test data: 0.8000\n"
     ]
    }
   ],
   "source": [
    "#加载模型并对测试样本进行测试\n",
    "model_path=\"./results/my_model.m\"\n",
    "load_and_model_prediction(X_test,y_test,model_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. 测试提交"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过对以上步骤流程的了解，相信大家对监督学习有了一定的认识。                       \n",
    "但是特征选方法比较简单，模型优化也缺少策略，模型指标准确率、召回率、`F-score` 也不高，模型也存在过拟合等机器学习问题。               \n",
    "大家可以试着写自己的特征选择方法和训练优化自己的监督学习模型，并将其调到最佳状态。                \n",
    "\n",
    "在训练模型等过程中如果需要**保存数据、模型**等请写到  **results**  文件夹。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.1 训练机器学习模型\n",
    "\n",
    "监督学习模型训练流程, 包含数据处理、特征选择、训练优化模型、模型保存、评价模型等。  \n",
    "如果对训练出来的模型不满意, 你可以通过修改数据处理方法、特征选择方法、调整模型类型和参数等方法重新训练模型, 直至训练出你满意的模型。  \n",
    "如果你对自己训练出来的模型非常满意, 则可以测试提交!  \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def processing_data(data_path):\n",
    "   \n",
    "    \"\"\"\n",
    "    数据处理\n",
    "    :param data_path: 数据集路径\n",
    "    :return: feature1,feature2,label:处理后的特征数据、标签数据\n",
    "    \"\"\"\n",
    "    feature1,feature2,label = None, None, None\n",
    "    # -------------------------- 实现数据处理部分代码 ----------------------------\n",
    "\n",
    "    # ------------------------------------------------------------------------\n",
    "    \n",
    "    return feature1,feature2,label\n",
    "\n",
    "\n",
    "def feature_select(feature1, feature2, label): \n",
    "    \"\"\"\n",
    "    特征选择\n",
    "    :param  feature1,feature2,label: 数据处理后的输入特征数据，标签数据\n",
    "    :return: new_features,label:特征选择后的特征数据、标签数据\n",
    "    \"\"\" \n",
    "    new_features= None\n",
    "    # -------------------------- 实现特征选择部分代码 ----------------------------\n",
    "\n",
    "    # ------------------------------------------------------------------------\n",
    "    # 返回筛选后的数据\n",
    "    return new_features,label\n",
    "\n",
    "def data_split(features,labels):\n",
    "\n",
    "    \"\"\"\n",
    "    数据切分\n",
    "    :param  features,label: 特征选择后的输入特征数据、类标数据\n",
    "    :return: X_train, X_val, X_test,y_train, y_val, y_test:数据切分后的训练数据、验证数据、测试数据\n",
    "    \"\"\" \n",
    "    \n",
    "    X_train, X_val, X_test,y_train, y_val, y_test=None, None,None, None, None, None\n",
    "    # -------------------------- 实现数据切分部分代码 ----------------------------\n",
    "\n",
    "    # ------------------------------------------------------------------------\n",
    "\n",
    "    return X_train, X_val, X_test,y_train, y_val, y_test\n",
    "\n",
    "\n",
    "def search_model(X_train, y_train,X_val,y_val, model_save_path):\n",
    "    \"\"\"\n",
    "    创建、训练、优化和保存深度学习模型\n",
    "    :param X_train, y_train: 训练集数据\n",
    "    :param X_val,y_val: 验证集数据\n",
    "    :param save_model_path: 保存模型的路径和名称\n",
    "    :return:\n",
    "    \"\"\"\n",
    "    # --------------------- 实现模型创建、训练、优化和保存等部分的代码 ---------------------\n",
    "\n",
    "    # 保存模型（请写好保存模型的路径及名称）\n",
    "    # -------------------------------------------------------------------------\n",
    "\n",
    "    \n",
    "def load_and_model_prediction(X_test,y_test,save_model_path):\n",
    "    \"\"\"\n",
    "    加载模型和评估模型\n",
    "    可以实现，比如: 模型优化过程中的参数选择，测试集数据的准确率、召回率、F-score 等评价指标！\n",
    "    主要步骤:\n",
    "        1.加载模型(请填写你训练好的最佳模型),\n",
    "        2.对自己训练的模型进行评估\n",
    "\n",
    "    :param X_test,y_test: 测试集数据\n",
    "    :param save_model_path: 加载模型的路径和名称,请填写你认为最好的模型\n",
    "    :return:\n",
    "    \"\"\"\n",
    "    # ----------------------- 实现模型加载和评估等部分的代码 -----------------------\n",
    "\n",
    "    # ---------------------------------------------------------------------------\n",
    "\n",
    "\n",
    "\n",
    "def main():\n",
    "    \"\"\"\n",
    "    监督学习模型训练流程, 包含数据处理、特征选择、训练优化模型、模型保存、评价模型等。  \n",
    "    如果对训练出来的模型不满意, 你可以通过修改数据处理方法、特征选择方法、调整模型类型和参数等方法重新训练模型, 直至训练出你满意的模型。  \n",
    "    如果你对自己训练出来的模型非常满意, 则可以进行测试提交! \n",
    "    :return:\n",
    "    \"\"\"\n",
    "    data_path = \"\"  # 数据集路径\n",
    "    \n",
    "    save_model_path = ''  # 保存模型路径和名称\n",
    "\n",
    "    # 获取数据 预处理\n",
    "    feature1,feature2,label = processing_data(data_path)\n",
    "   \n",
    "    #特征选择\n",
    "    new_features,label = feature_select(feature1, feature2, label)\n",
    "   \n",
    "    #数据划分\n",
    "    X_train, X_val, X_test,y_train, y_val, y_test = data_split(new_features,label)\n",
    "    \n",
    "    # 创建、训练和保存模型\n",
    "    search_model(X_train, y_train,X_val,y_val, save_model_path)\n",
    "\n",
    "    # 评估模型\n",
    "    load_and_model_prediction(X_test,y_test,save_model_path)\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    main()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.2 数据处理和特征选择\n",
    "\n",
    "1. 请将你的 `数据处理` 和 `特征选择` 函数进行修改后整合成一个函数 `data_processing_and_feature_selecting()` ，能够对后台的测试数据进行处理和特征选择以便对你的模型进行测试打分。\n",
    "2. 测试数据的格式与你所使用的医疗数据相同，`xlsx`文件，包括多个测试样本， 3 张表，表 `Feature1` 为医学影像特征共 6670 维，表 `Feature2` 为肠道特征共 377 维，表 `label` 为样本类标，正样本为 1 ，负样本为 -1 。\n",
    "3. 在修改 `data_processing_and_feature_selecting()` 函数时请务必注意，直接使用你在平台提供的医疗数据集上的特征排序结果对测试数据进行选择即可，避免对测试数据的二次排序结果和原排序结果不同导致测试结果的偏差。\n",
    "4. 请填写你的模型路径及名称并补充 `predict()` 函数以实现预测。\n",
    "5. 点击左侧栏`提交结果`后点击`生成文件`则需勾选 `data_processing_and_feature_selecting()` 函数和`predict()`函数的 cell，即【**数据处理和特征选择**】和【**数据预测代**】码答题区域的 cell。\n",
    "6. 请导入必要的包和第三方库 (包括此文件中曾经导入过的)。\n",
    "7. 请加载你认为训练最佳的模型，即请按要求填写模型路径。          \n",
    "8. **测试提交时服务端会调用`data_processing_and_feature_selecting()` 函数和 `predict()` 函数，请不要修改该函数的输入输出及其数据类型。**                                  \n",
    "9. 测试提交时记得填写你的模型路径及名称, 如果采用 [离线任务](https://momodel.cn/docs/#/zh-cn/%E5%9C%A8GPU%E6%88%96CPU%E8%B5%84%E6%BA%90%E4%B8%8A%E8%AE%AD%E7%BB%83%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E6%A8%A1%E5%9E%8B) 请将模型保存在 **results** 文件夹下。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "============================  **模型预测代码答题区域**  ============================\n",
    "<br>\n",
    "\n",
    "\n",
    "在下方的代码块中编写 **模型预测** 部分的代码，请勿在别的位置作答"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "select": true
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn.externals import joblib\n",
    "from sklearn.preprocessing import MinMaxScaler\n",
    "\n",
    "\n",
    "def data_processing_and_feature_selecting(data_path): \n",
    "    \"\"\"\n",
    "    特征选择\n",
    "    :param  data_path: 数据集路径\n",
    "    :return: new_features,label: 经过预处理和特征选择后的特征数据、标签数据\n",
    "    \"\"\" \n",
    "    new_features,label = None, None\n",
    "    # -------------------------- 实现数据处理和特征选择部分代码 ----------------------------\n",
    "\n",
    "    # ------------------------------------------------------------------------\n",
    "    # 返回筛选后的数据\n",
    "    return new_features,label\n",
    "\n",
    "\n",
    "    \n",
    "# -------------------------- 请加载您最满意的模型 ---------------------------\n",
    "# 加载模型(请加载你认为的最佳模型)\n",
    "# 加载模型,加载请注意 model_path 是相对路径, 与当前文件同级。\n",
    "# 如果你的模型是在 results 文件夹下的 my_model.m 模型，则 model_path = 'results/my_model.m'\n",
    "model_path = None\n",
    "\n",
    "# 加载模型\n",
    "model = None\n",
    "\n",
    "# ---------------------------------------------------------------------------\n",
    "\n",
    "def predict(new_features):\n",
    "    \"\"\"\n",
    "    加载模型和模型预测\n",
    "    :param  new_features : 测试数据，是 data_processing_and_feature_selecting 函数的返回值之一。\n",
    "    :return y_predict : 预测结果是标签值。\n",
    "    \"\"\"\n",
    "    # -------------------------- 实现模型预测部分的代码 ---------------------------\n",
    "    # 获取输入图片的类别\n",
    "    y_predict = None\n",
    "\n",
    "    # -------------------------------------------------------------------------\n",
    "    \n",
    "    # 返回图片的类别\n",
    "    return y_predict"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.3 **测试提交函数示例**\n",
    "\n",
    "例：对数据不做任何预处理，进行特征整合，并训练得到模型`my_model.m`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/jovyan/.virtualenvs/basenv/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.\n",
      "  warnings.warn(msg, category=FutureWarning)\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn.externals import joblib\n",
    "\n",
    "\n",
    "def data_processing_and_feature_selecting(data_path): \n",
    "    \"\"\"\n",
    "    特征选择\n",
    "    :param  data_path: 数据集路径\n",
    "    :return: new_features,label: 经过预处理和特征选择后的特征数据、类标数据\n",
    "    \"\"\" \n",
    "    \n",
    "     #导入医疗数据\n",
    "    data_xls = pd.ExcelFile(data_path)\n",
    "    data={}\n",
    "    #查看数据名称与大小\n",
    "    for name in data_xls.sheet_names:\n",
    "            df=data_xls.parse(sheetname=name,header=None)\n",
    "            data[name]=df\n",
    "    #获取数据\n",
    "    feature1_raw=data['Feature1']\n",
    "    feature2_raw=data['Feature2']\n",
    "    label=data['label']\n",
    "\n",
    "    # 整合得到新特征\n",
    "    features = pd.concat([feature1_raw,feature2_raw],axis=1)\n",
    "    \n",
    "    new_features = features\n",
    "\n",
    "    # 返回筛选后的数据\n",
    "    return new_features,label\n",
    "\n",
    "\n",
    "    \n",
    "# -------------------------- 请加载您最满意的模型 ---------------------------\n",
    "# 加载模型(请加载你认为的最佳模型)\n",
    "model_path = 'results/my_model.m'\n",
    "\n",
    "# 加载模型\n",
    "model = joblib.load(model_path)\n",
    "\n",
    "# ---------------------------------------------------------------------------\n",
    "\n",
    "def predict(new_features):\n",
    "    \"\"\"\n",
    "    加载模型和模型预测\n",
    "    :param  new_features : 测试数据\n",
    "    :return y_predict : 预测结果\n",
    "    \"\"\"\n",
    "\n",
    "    y_predict = model.predict(new_features)\n",
    "\n",
    "    \n",
    "    # 返回图片的类别\n",
    "    return y_predict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 导入相关库\n",
    "import warnings\n",
    "import itertools\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "from time import time\n",
    "from minepy import MINE\n",
    "from sklearn import svm\n",
    "from sklearn import tree\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn import naive_bayes\n",
    "from scipy.stats import pearsonr\n",
    "from sklearn.manifold import TSNE\n",
    "from IPython.display import display\n",
    "from datetime import datetime as dt\n",
    "from sklearn.externals import joblib\n",
    "from sklearn.decomposition import PCA\n",
    "from sklearn.metrics import fbeta_score\n",
    "from sklearn.metrics import make_scorer\n",
    "from sklearn.metrics import recall_score\n",
    "from sklearn.model_selection import KFold\n",
    "from sklearn.feature_selection import chi2\n",
    "from sklearn.metrics import accuracy_score\n",
    "from sklearn.preprocessing import MinMaxScaler\n",
    "from sklearn.model_selection import ShuffleSplit\n",
    "from sklearn.model_selection import GridSearchCV\n",
    "from sklearn.feature_selection import SelectKBest\n",
    "from sklearn.model_selection import learning_curve\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "\n",
    "\n",
    "def data_processing_and_feature_selecting(data_path):\n",
    "    \"\"\"\n",
    "    特征选择\n",
    "    :param  data_path: 数据集路径\n",
    "    :return: new_features,label: 经过预处理和特征选择后的特征数据、类标数据\n",
    "    \"\"\"\n",
    "    data_xls = pd.ExcelFile('DataSet.xlsx')\n",
    "    data = {}\n",
    "    for name in data_xls.sheet_names:\n",
    "        df = data_xls.parse(sheet_name=name, header=None)\n",
    "        # print(\"%-8s 表的 shape:\" % name, df.shape)\n",
    "        data[name] = df\n",
    "    feature1_raw = data['Feature1']\n",
    "    feature2_raw = data['Feature2']\n",
    "    label = data['label']\n",
    "    scalar = MinMaxScaler()\n",
    "    feature1 = pd.DataFrame(scalar.fit_transform(feature1_raw))\n",
    "    feature2 = pd.DataFrame(scalar.fit_transform(feature2_raw))\n",
    "    select_feature_number = 5\n",
    "    select_feature1 = SelectKBest(lambda X, Y: tuple(map(tuple, np.array(list(map(lambda x: pearsonr(x, Y), X.T))).T)),\n",
    "                                  k=select_feature_number\n",
    "                                  ).fit(feature1, np.array(label).flatten()).get_support(indices=True)\n",
    "\n",
    "    select_feature2 = SelectKBest(lambda X, Y: tuple(map(tuple, np.array(list(map(lambda x: pearsonr(x, Y), X.T))).T)),\n",
    "                                  k=select_feature_number\n",
    "                                  ).fit(feature2, np.array(label).flatten()).get_support(indices=True)\n",
    "    new_features = pd.concat([feature1[feature1.columns.values[select_feature1]],\n",
    "                              feature2[feature2.columns.values[select_feature2]]], axis=1)\n",
    "    # 返回筛选后的数据\n",
    "    return new_features, label\n",
    "\n",
    "\n",
    "model_path = 'results/my_model.m'\n",
    "\n",
    "# 加载模型\n",
    "model = joblib.load(model_path)\n",
    "\n",
    "\n",
    "def predict(new_features):\n",
    "    \"\"\"\n",
    "    加载模型和模型预测\n",
    "    :param  new_features : 测试数据，是 data_processing_and_feature_selecting 函数的返回值之一。\n",
    "    :return y_predict : 预测结果是标签值。\n",
    "    \"\"\"\n",
    "    # 获取输入图片的类别\n",
    "    y_predict = model.predict(new_features)\n",
    "\n",
    "    # 返回图片的类别\n",
    "    return y_predict\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
